icosaplex t1_ixijpuw wrote
Reply to comment by LurkAroundLurkAround in [R] Human-level play in the game of Diplomacy by combining language models with strategic reasoning — Meta AI by hughbzhang
I'm one of of the paper authors:
You can see a full anonymized table of scores and ranks near the end of the Supplementary Material file linked for download at the end of the Science article. No player other than Cicero played anywhere close to 40 games, so such a procedure wouldn't be possible. Each game takes hours and requires scheduling 6 players to be simultaneously available, so understandably many players, including many good players, only played a handful of games each. If you restricted to, say, players with >= 5 games, Cicero would be 2/19.
We don't make a claim of being superhuman as AlphaGo did - we believe Cicero in this setting is at the level of a strong human player but not superhuman. We worked with top Diplomacy experts who have given us this feedback.
One thing to keep in mind is that Diplomacy has variance: there is practical luck in which players choose to ally with you or someone else, or whether you guess right or wrong in things like coin-flip tactical situations. So similar to, e.g. poker, even a middling player may occasionally win big in the short-run against top-level players to a degree that would not hold up in the long run. This means including players with too few games can sometimes have the exact opposite bias and make a strong result seem worse by comparison. In that quoted stat, we chose a threshold of > 1 game as a compromise between mitigating the most misleading tail of that bias, while still including as many players as possible rather than picking a higher threshold and arbitrarily cutting out large chunks of the player population from the comparison.
But of course, none of that ultimately matters since you can still check out the full list yourself.
If you're interested in a bit more context on the player pool: the setting was a casual but competitive online blitz Diplomacy league advertised at various times in some of the main online Diplomacy community sites. Many newer players signed up and played, but also experienced players, and as an organized league I'd expect the overall average level of play to be a little higher than, e.g. generic online games.
And thank you and others for raising such questions - it's been fun and interesting to see discussions like this.
Viewing a single comment thread. View all comments