Clip
Improving AI's approximation of human play through self-play
Researchers use self-play processes to create a better model of human play by allowing deviations from an anchor policy for actions with high expected value. The neural net trained to imitate human data is referred to as the anchor policy.