Viewing a single comment thread. View all comments

MetaAI_Official OP t1_izfj3yl wrote

We tried hard in the paper to articulate the important research challenges and how we solved them. At a high level, the big questions were:

  • RL/planning: What even constitutes a good strategy in games with both competition and cooperation? The theory that undergirds prior successes in games no longer applies
  • NLP: How can we maintain dialogues that remain coherent and grounded over very long interactions
  • Joint: How do we make the agent speak and act in a “unified” way? I.e. how does dialogue inform actions and planning inform dialogue so we can use dialogue intentionally to achieve goals?

One practical challenge we faced was how to measure progress during CICERO’s development. At first we tried comparing different agents by playing them against each other, but we found that good performance against other agents didn’t correlate well with how well it would play with humans, especially when language is involved! We ended up developing a whole spectrum of evaluation approaches, including A/B testing specific components of the dialogue, collaborating with three top Diplomacy players (Andrew Goff, Markus Zijlstra, and Karthik Konath) to play with CICERO and annotate its messages and moves in self-play games, and looking at the performance of CICERO against diverse populations of agents. -AL

5