Learning Common Features for Video and Audio Signals

Clip

Learning Common Features for Video and Audio Signals

1:30:44 - 1:33:27 (02:42)

In multimodal learning, both video and audio signals are used to learn a common feature space where related modalities can be together. This video network can be used to recognize human actions or different types of sounds.

Clip

Learning Common Features for Video and Audio Signals

1:30:44 - 1:33:27 (02:42)

Similar Clips