Viewing a single comment thread. View all comments

Soupjoe5 OP t1_ixr3v8m wrote

2

The Meta team’s crucial contribution was therefore to augment reinforcement learning with natural-language processing. Large language models, trained on vast amounts of data to predict deleted words, have an uncanny ability to mimic the patterns of real language and say things that humans might. For Cicero, the team started with a pre-trained model with a baseline understanding of language, and fine-tuned this on dialogues from more than 40,000 past games, to teach it Diplomacy-specific patterns of speech.

To play the game, Cicero looks at the board, remembers past moves and makes an educated guess as to what everyone else will want to do next. Then it tries to work out what makes sense for its own move, by choosing different goals, simulating what might happen, and also simulating how all the other players will react to that.

Once it has come up with a move, it must work out what words to say to the others. To that end, the language model spits out possible messages, throws away the bad ideas and anything that is actual gobbledygook, and chooses the ones, appropriate to the recipients concerned, that its experience and algorithms suggest will most persuasively further its agenda.

Cicero, then, can negotiate, convince, co-operate and compete. Seasoned Diplomacy players will, though, want to know something else: has it learned how to stab? Stabbing—saying one thing and doing another (especially, attacking a current ally) is seen by many as Diplomacy’s defining feature. But, though Cicero did, “strategically withhold information from players in gameplay”, it did not actually stab any of its opponents. Perhaps it was this final lack of Machiavellian ruthlessness which explains why it was only in the top 10%, and not victor ludorum.

13