Clip

Mixing self-play and human data for reinforcement learning
listen on SpotifyListen on Youtube
1:57:36 - 1:59:12 (01:36)

Leveraging self-play techniques from poker and go, this reinforcement learning model combines human data with self-play algorithms by penalizing the model for making unlikely actions from the human data set.

Similar Clips