Clip

Learning Common Features for Video and Audio Signals
listen on Spotify
1:30:44 - 1:33:27 (02:42)

In multimodal learning, both video and audio signals are used to learn a common feature space where related modalities can be together. This video network can be used to recognize human actions or different types of sounds.

Similar Clips