Chapter

Self-Play in Diplomacy and Pretraining Language Models
listen on SpotifyListen on Youtube
1:59:12 - 2:09:20 (10:08)

The self-play process in diplomacy involves conditioning the language model on good intents and deviating from human anchor policy if there is an action with high expected value. Additionally, pre-training the language model on internet data aids in approximating human play.

Clips
The language model was pre-trained on numerous internet data and utilized effectively in diplomacy games by training a neural network on 50,000 games to fill in gaps in how communication happens beyond just the games.
1:59:12 - 2:01:57 (02:45)
listen on SpotifyListen on Youtube
Language Models
Summary

The language model was pre-trained on numerous internet data and utilized effectively in diplomacy games by training a neural network on 50,000 games to fill in gaps in how communication happens beyond just the games.

Chapter
Self-Play in Diplomacy and Pretraining Language Models
Episode
#344 – Noam Brown: AI vs Humans in Poker and Games of Strategic Negotiation
Podcast
Lex Fridman Podcast
The concept of "lying" in AI chatbots is different from how humans perceive it.
2:01:58 - 2:04:23 (02:25)
listen on SpotifyListen on Youtube
AI chatbots
Summary

The concept of "lying" in AI chatbots is different from how humans perceive it. AI bots predict the user's intention based on their messages, and sometimes their predictions can lead to inaccurate assumptions that might be considered lying.

Chapter
Self-Play in Diplomacy and Pretraining Language Models
Episode
#344 – Noam Brown: AI vs Humans in Poker and Games of Strategic Negotiation
Podcast
Lex Fridman Podcast
The limitations of self-play in language models are discussed, specifically in terms of controlling human-like behavior and scaling the model.
2:04:23 - 2:07:30 (03:06)
listen on SpotifyListen on Youtube
Self-Play
Summary

The limitations of self-play in language models are discussed, specifically in terms of controlling human-like behavior and scaling the model.

Chapter
Self-Play in Diplomacy and Pretraining Language Models
Episode
#344 – Noam Brown: AI vs Humans in Poker and Games of Strategic Negotiation
Podcast
Lex Fridman Podcast
Researchers use self-play processes to create a better model of human play by allowing deviations from an anchor policy for actions with high expected value.
2:07:30 - 2:09:20 (01:50)
listen on SpotifyListen on Youtube
AI
Summary

Researchers use self-play processes to create a better model of human play by allowing deviations from an anchor policy for actions with high expected value. The neural net trained to imitate human data is referred to as the anchor policy.

Chapter
Self-Play in Diplomacy and Pretraining Language Models
Episode
#344 – Noam Brown: AI vs Humans in Poker and Games of Strategic Negotiation
Podcast
Lex Fridman Podcast