Multimodal Learning with Audio and Video

Chapter

Multimodal Learning with Audio and Video

1:30:44 - 1:35:19 (04:34)

The idea of multimodal learning is to take audio and video signals and learn a common embedding space where the two modalities can be closely related. By doing so, the learned representation can be used to recognize human actions and different types of sounds in downstream tasks.

Clips

Learning Common Features for Video and Audio Signals

In multimodal learning, both video and audio signals are used to learn a common feature space where related modalities can be together.

1:30:44 - 1:33:27 (02:42)

Multimodal Learning

Summary

In multimodal learning, both video and audio signals are used to learn a common feature space where related modalities can be together. This video network can be used to recognize human actions or different types of sounds.

Chapter
Multimodal Learning with Audio and Video

Episode
#206 – Ishan Misra: Self-Supervised Deep Learning in Computer Vision

Podcast
Lex Fridman Podcast

Neural Networks Can Visualize Sounds and Associate Them with Objects

Neural networks can learn to associate sounds with objects through correlation and detect the location of where the sound is coming from.

1:33:27 - 1:35:19 (01:51)

Artificial Intelligence

Summary

Neural networks can learn to associate sounds with objects through correlation and detect the location of where the sound is coming from. Additionally, it can detect the location of a person’s mouth based on their voice.

Chapter

Multimodal Learning with Audio and Video

1:30:44 - 1:35:19 (04:34)

Clips

Learning Common Features for Video and Audio Signals

In multimodal learning, both video and audio signals are used to learn a common feature space where related modalities can be together.

1:30:44 - 1:33:27 (02:42)

Summary

ChapterMultimodal Learning with Audio and Video

Multimodal Learning with Audio and Video

Episode#206 – Ishan Misra: Self-Supervised Deep Learning in Computer Vision

#206 – Ishan Misra: Self-Supervised Deep Learning in Computer Vision

PodcastLex Fridman Podcast

Lex Fridman Podcast

Neural Networks Can Visualize Sounds and Associate Them with Objects

Neural networks can learn to associate sounds with objects through correlation and detect the location of where the sound is coming from.

1:33:27 - 1:35:19 (01:51)

Summary

ChapterMultimodal Learning with Audio and Video

Multimodal Learning with Audio and Video

Episode#206 – Ishan Misra: Self-Supervised Deep Learning in Computer Vision

#206 – Ishan Misra: Self-Supervised Deep Learning in Computer Vision

PodcastLex Fridman Podcast

Lex Fridman Podcast

Chapter
Multimodal Learning with Audio and Video

Episode
#206 – Ishan Misra: Self-Supervised Deep Learning in Computer Vision

Podcast
Lex Fridman Podcast

Chapter
Multimodal Learning with Audio and Video

Episode
#206 – Ishan Misra: Self-Supervised Deep Learning in Computer Vision

Podcast
Lex Fridman Podcast