Chapter
Challenges in computer vision that mimic a child's learning process
The process of a child learning how to act in the world is different from current computer vision technology, which is mostly focused on short-term video understanding. Mimicking a child's learning process in computer vision can lead to progress in areas such as autonomous driving and robotics.
Clips
This podcast discusses the current state of video recognition technology and how it lags behind object recognition, with action classification performance around 30% compared to object detection in 2009.
32:36 - 35:19 (02:42)
Summary
This podcast discusses the current state of video recognition technology and how it lags behind object recognition, with action classification performance around 30% compared to object detection in 2009. The speaker also considers the potential need for knowledge bases and reasoning to improve action recognition and ponders what the solution to the general action recognition problem might look like.
ChapterChallenges in computer vision that mimic a child's learning process
Episode#110 – Jitendra Malik: Computer Vision
PodcastLex Fridman Podcast
The use of schemas, scripts, and frames are essential for AI to understand long-form videos.
35:19 - 37:02 (01:42)
Summary
The use of schemas, scripts, and frames are essential for AI to understand long-form videos. Hand-coding these ideas was the norm in the past, but new approaches are needed for more sophisticated long-term video understanding.
ChapterChallenges in computer vision that mimic a child's learning process
Episode#110 – Jitendra Malik: Computer Vision
PodcastLex Fridman Podcast
The speaker discusses the importance of teaching computer vision in a similar way that children learn by experiencing different scenarios, such as going to a restaurant, and suggests finding learning ways to make it more robust.
37:02 - 42:33 (05:30)
Summary
The speaker discusses the importance of teaching computer vision in a similar way that children learn by experiencing different scenarios, such as going to a restaurant, and suggests finding learning ways to make it more robust.