Clip

Improving AI's approximation of human play through self-play
listen on SpotifyListen on Youtube
2:07:30 - 2:09:20 (01:50)

Researchers use self-play processes to create a better model of human play by allowing deviations from an anchor policy for actions with high expected value. The neural net trained to imitate human data is referred to as the anchor policy.

Similar Clips