Submitted by hughbzhang t3_z1yt45 in MachineLearning

Paper: https://www.science.org/doi/10.1126/science.ade9097?fbclid=IwAR2Z3yQJ1lDMuBUyfICtHnWz2zRZEhbodBkAJlYshvxkCqpcYFhq5a_Cg6Q

Blog: https://ai.facebook.com/blog/cicero-ai-negotiates-persuades-and-cooperates-with-people/?utm_source=twitter&utm_medium=organic_social&utm_campaign=cicero&utm_content=video

Github: https://github.com/facebookresearch/diplomacy_cicero

Abstract:

Despite much progress in training AI systems to imitate human language, building agents that use language to communicate intentionally with humans in interactive environments remains a major challenge. We introduce Cicero, the first AI agent to achieve human-level performance in Diplomacy, a strategy game involving both cooperation and competition that emphasizes natural language negotiation and tactical coordination between seven players. Cicero integrates a language model with planning and reinforcement learning algorithms by inferring players' beliefs and intentions from its conversations and generating dialogue in pursuit of its plans. Across 40 games of an anonymous online Diplomacy league, Cicero achieved more than double the average score of the human players and ranked in the top 10% of participants who played more than one game.

​

Overview of the agent

​

Example dialogues

Disclosure: I am one of the authors of the above paper.

Edit: I just heard from the team that they’re planning an AMA to discuss this work soon, keep an eye out for that on /r/machinelearning.

366

Comments

You must log in or register to comment.

farmingvillein t1_ixdp4t8 wrote

Very neat! Would love to see a version built with fewer filters (secondary models)--i.e., more grounded in a singular, "base" model // less hand-tweaking--but otherwise very cool. (Although wouldn't surprise me if simply upgrading the model size went a long way here.)

11

Amortize_Me_Daddy t1_ixdyg12 wrote

Very cool work. I saw this on my LinkedIn feed and immediately had to share it with my fiancé who is a huge fan of risk and diplomacy. To me, this seems like a much bigger deal than AlphaGo - can someone give me a sanity check?

I’m also interested in how much thought was put into the persuasiveness of generated messages when making a proposal. It seems like something way out of the scope of RL, but still quite important to optimize. I am just… astounded reading over that convo between France and Turkey. If you have time, would you mind offering some insight into the impressive “salesmanship” of CICERO’s language model?

45

sam__izdat t1_ixefdih wrote

> Example dialogues

ITALY: So, what are you wearing?

24

ReginaldIII t1_ixelgkf wrote

A strange game. The only winning move is not to play. How about a nice game of chess?

E: -7? It was a movie quote guys...

1

LurkAroundLurkAround t1_ixf68yn wrote

AlphaGo was beating the best, this is, according to the post, a top 10% player, which most likely means 9.x% percentile. This also includes players with more than 1 game, but they played 40 games. So just by allowing a bunch of 2 games player they up their stats. A fair comparison would have been to take players with at least 40 games, sample 40 games randomly and compute the score, and then check the performance on this subtrata.

Not to take away anything from the team, but given how the the results are framed, my instinct is to believe that this is a bit oversold.

19

gwern t1_ixfdodb wrote

There's no comparison to prior full-press Diplomacy agents, but if I'm reading the prior-work cites right, this is because basically none of them work - not only do they not beat humans, they apparently don't even always improve over themselves playing the game as if it was no-press Diplomacy (ie not using dialogue at all). That gives an idea how big a jump this is for full-press Diplomacy.

Author Adam Lerer on speed of progress:

> In 2019 Noam Brown and I decided to tackle Diplomacy because it was the hardest game for AI we could think of and went beyond moving pieces on a board to cooperating with people through language. We thought human-level play was a decade away.

32

Acceptable-Cress-374 t1_ixg6ngd wrote

Listened to a podcast with Andrej Karpathy recently, and his intuition for the future of LLM is that we'll see more collaboration and stacking of models, sort of a "council of GPT's" kind of approach, where you have models trained on particular tasks working together towards the goal.

Whatever the future holds, I'm betting we'll see constant improvements over the next few years, before we see a new revolutionary one-model take.

11

farmingvillein t1_ixgd88a wrote

Yeah, understood, but that wasn't really what was going on here (unless you take a really expansive definition).

They were basically doing a ton of hand-calibration of a very large # of models, to achieve the desired end-goal performance--if you read the supplementary materials, you'll see that they did a lot of very fiddly work to select model output thresholds, build training data, etc.

On the one hand, I don't want to sound overly critical of a pretty cool end-product.

On the other, it really looks a lot more like a "product", in the same way that any gaming AI would be, than a singular (or close to it) AI system which is learning to play the game.

8

graphicteadatasci t1_ixh2mk4 wrote

But they specifically created a model for playing Diplomacy - not a process for building board game playing models. With the right architecture and processes then they could probably do away with most of that hand-calibration stuff but the goal here was to create a model that does one thing.

1

icosaplex t1_ixijpuw wrote

I'm one of of the paper authors:

You can see a full anonymized table of scores and ranks near the end of the Supplementary Material file linked for download at the end of the Science article. No player other than Cicero played anywhere close to 40 games, so such a procedure wouldn't be possible. Each game takes hours and requires scheduling 6 players to be simultaneously available, so understandably many players, including many good players, only played a handful of games each. If you restricted to, say, players with >= 5 games, Cicero would be 2/19.

We don't make a claim of being superhuman as AlphaGo did - we believe Cicero in this setting is at the level of a strong human player but not superhuman. We worked with top Diplomacy experts who have given us this feedback.

One thing to keep in mind is that Diplomacy has variance: there is practical luck in which players choose to ally with you or someone else, or whether you guess right or wrong in things like coin-flip tactical situations. So similar to, e.g. poker, even a middling player may occasionally win big in the short-run against top-level players to a degree that would not hold up in the long run. This means including players with too few games can sometimes have the exact opposite bias and make a strong result seem worse by comparison. In that quoted stat, we chose a threshold of > 1 game as a compromise between mitigating the most misleading tail of that bias, while still including as many players as possible rather than picking a higher threshold and arbitrarily cutting out large chunks of the player population from the comparison.

But of course, none of that ultimately matters since you can still check out the full list yourself.

If you're interested in a bit more context on the player pool: the setting was a casual but competitive online blitz Diplomacy league advertised at various times in some of the main online Diplomacy community sites. Many newer players signed up and played, but also experienced players, and as an organized league I'd expect the overall average level of play to be a little higher than, e.g. generic online games.

And thank you and others for raising such questions - it's been fun and interesting to see discussions like this.

20

velcher t1_ixj98ee wrote

Great results! Some feedback:

  • I'm somewhat unsatisfied with the amount of human engineering / annotation pipelines that went into the agent. Importantly the "intention" mechanisms, which seem to be a key part of making the dialogue -> planning part tractable.
  • This annoyance somewhat extends to the "message filtering mechanisms" to prevent non-sensical, incoherent messages, as this seems more of a hack. Really, the agent should learn to converse from the objective of being an optimal player (amongst other humans). Because if it starts speaking gibberish, then other human players can tell it is an AI. This would most likely be a bad outcome for the agent (unless the humans are blue-pilled).
  • From what I gather, it seems like it is only trained on "truthful" subset of the dialogue data, which means the agent cannot lie. Deceit seems pretty important for winning Diplomacy.
  • The sections on planning are not easy to understand concretely, specifically "Dialogue-conditional planning" and "Self-play reinforcement learning for improved value estimation". The authors seem to paraphrase the math and logic in words and omit equations to keep it high level, but this just makes everything more vague. Luckily, the supplemental seems to have the details.
  • Thanks for publishing the code. This is very important for the research community. I hope FAIR continues to do this.

Also, the PDF from science.org is terrible. I can't even highlight lines with my Mac's preview app. Please fix that if you get a chance!

4

TheAsianIsGamin t1_ixplwod wrote

It seems like the vast majority of CICERO's pitches are "here's an optimal play for you, you should do it not only because it's good for you but also because it's good for us." In other words, pointing players towards rationality. Of course, high level players in any social game are far more likely than their less skilled counterparts to want to make the rational play, so it's likely that there's some selection bias influencing how effective that salesmanship is. However, even high level players are governed by the emotions that break down game theory!

Here's an example case: I'm curious to see how CICERO responded in situations where they talk to Human A about a plan that requires Human B, but A doesn't trust B. How does CICERO respond to that? It may very well be that it doesn't get in those spots because it thinks about who's likeliest to align with whom and in what way. In this sense, it's playing to its strengths and not attempting plays it can't execute, which is an impressive strategic feat. But of course, I'm interested in seeing it try things it can't do - in this case, try a different mode of persuasion.

2