the_timezone_bot t1_izbd6xi wrote on December 7, 2022 at 9:28 PM

#880,543

10am PT happens when this comment is 20 hours and 31 minutes old.

You can find the live countdown here: https://countle.com/tSxJzFHsC

I'm a bot, if you want to send feedback, please comment below or send a PM.

MTGTraner t1_izbexsh wrote on December 7, 2022 at 9:40 PM

#880,592

Verified, thank you /u/cryfi for coordinating this with the Meta team!

hophophop1233 t1_izbfgfx wrote on December 7, 2022 at 9:43 PM

#880,615

What do you see as applications outside of this game? Were there any novel techniques or something else we should be aware of? Is anything a paradigm shift or could be impactful for sectors moving forward? What are the requirements to run the agent/model? Will pre-trained models be available? Will this be open sourced?

[deleted] t1_izbjbxe wrote on December 7, 2022 at 10:09 PM

#880,746

What is the motivation for developing ‘Human-like play’? It doesn’t seem obvious to me how imperceptibility is useful in the wider applications of your methods.

Rybolos t1_izblt3v wrote on December 7, 2022 at 10:26 PM

#880,844

How do you see the future of such systems and the ethical limitations that need to be addressed? Would licenses like RAIL need an update?

pimmen89 t1_izbm876 wrote on December 7, 2022 at 10:29 PM

#880,864

Were there any local optimums that CICERO got stuck in, especially during development?

pm_me_your_pay_slips t1_izbngrf wrote on December 7, 2022 at 10:37 PM

#880,919

Does backstabbing emerge? Is it possible to win without backstabbing?

Liorogamer t1_izbnj3k wrote on December 7, 2022 at 10:38 PM

#880,922

Loved this paper! What was the most difficult engineering challenge y’all encountered while working on the CICERO system?

bpe9 t1_izbnux1 wrote on December 7, 2022 at 10:40 PM

#880,941

Particularly for Noam Brown but also open more generally: You have a background in financial markets and algorithmic trading. How do you see RL impacting this field in the coming years and what do you think will be the catalyst for wider spread acceptance in the conventional finance/econ space for RL based approaches

ClayStep t1_izbsdur wrote on December 7, 2022 at 11:12 PM

#881,171

I was at your neurips talk. I note that the language model (conditioned on world state) had the ability to suggest moves to a human player, which the human player found to be good moves.

Could the same model be used to suggest moves for the agent? What are the limitations?

[deleted] t1_izbsh11 wrote on December 7, 2022 at 11:12 PM

#881,177

[removed]

nraw t1_izbt1cq wrote on December 7, 2022 at 11:16 PM

#881,199

Thanks for the AMA! I'm curious about understanding how do you present the value added of your team? Like, what are you bringing to Meta by doing this research?

TissueReligion t1_izbt2ja wrote on December 7, 2022 at 11:16 PM

#881,202

Noam, as a 5th year phd student still unclear on when he's going to finish, I would be curious to hear about the story of how you spent 8 years in grad school (I mean it in a positive way!).

Thanks.

xutw21 t1_izbthbz wrote on December 7, 2022 at 11:19 PM

#881,222

The research paper mentioned it briefly, but I'd like to know what the major challenges were during CICERO's development and how you overcame them individually to achieve human-level performance in the game Diplomacy?

pyepyepie t1_izbzd9r wrote on December 8, 2022 at 12:02 AM

#881,472

I feel like the agent was implemented incredibly well, however, the grounding and "information selection" of the language model was not "clean" since it used classifiers to filter messages. Since the Diplomacy team is extremely competent, I wonder if you had put efforts regarding grounding better (in a general context) and if it's in a future plan, as I feel like it's very important for the community (arguably one of the most important problems in NLP).

edit: I know that the language model was conditioned on in-game orders, etc., but I wonder if you intend to work on novel algorithms for it in the future.

sofacan1 t1_izbzjns wrote on December 8, 2022 at 12:03 AM

#881,477

Love the work and the team! What do you look for in candidates (ie education, experience, etc) are you hiring 🙃?

thatguydr t1_izbzkix wrote on December 8, 2022 at 12:03 AM

#881,478

Have you stopped being friends with the algorithm? Not clear how you're measuring human performance at Diplomacy, but I presume "never trusting it fully ever again" is part of your cost function?

Beor_The_Old t1_izc1lpo wrote on December 8, 2022 at 12:19 AM

#881,563

Replying to hophophop1233 (#880,615)

Bridging rl and planning and connecting both to NLP has been an area of interest for a while. You could adjust large language models to have more of a personality and understanding of the concept of a state. This is at least a good demonstration that those three concepts can be connected well in a HCI setting

[deleted] t1_izc271a wrote on December 8, 2022 at 12:23 AM

#881,586

Replying to sofacan1 (#881,477)

[deleted]

JustOneAvailableName t1_izc2l2j wrote on December 8, 2022 at 12:26 AM

#881,608

Replying to Beor_The_Old (#881,563)

> understanding of the concept of a state.

I kinda think we're already there

NeverStopWondering t1_izc4egr wrote on December 8, 2022 at 12:39 AM

#881,689

Has the model played against copies of itself (post-training, I mean), and if so, did any interesting or odd emergent strategies form?

addition t1_izc6uv9 wrote on December 8, 2022 at 12:58 AM

#881,802

What are your thoughts on the morality of helping a company like Meta teach an AI how to manipulate humans?

JimmyTheCrossEyedDog t1_izc6ym1 wrote on December 8, 2022 at 12:59 AM

#881,808

This was one of the most impressive AI advancements I've seen in recent memory, so congrats and kudos on such great work.

As I see it, one simplifying factor that Diplomacy (like any game) has is the discrete set of potential actions to be taken. When it comes to extending an AI like CICERO, to other sorts of problems, do you see the possibility of such problems having a non-disctetizable action space as a major hurdle, and are there particular difficulties associated with that and potential mitigations for them?

Roger_M8 t1_izc9fcj wrote on December 8, 2022 at 1:17 AM

#881,936

Where did the idea for the paper/research come from and where do you usually start ?

Effective-Dig8734 t1_izcbrje wrote on December 8, 2022 at 1:34 AM

#882,050

My question would be what is the next step of this research?

dborowiec10 t1_izch61l wrote on December 8, 2022 at 2:15 AM

#882,303

How many and what kind of computational resources were involved in training CICERO? How long did the training take? If you have access to such information, could you elaborate in which region of the world the computation took place and what the energy/fuel mix was that powered the machines?

Given this excerpt from the github repo: "One can also instead pass launcher.local.use_local=true to run them on locally, e.g. on an individual 8-GPU-or-more GPU machine but training may be very slow", and "launcher.slurm.num_gpus=256", it seems as the resources were quite substantial.
It would be good to get some carbon accountability on this.

pyepyepie t1_izci4w8 wrote on December 8, 2022 at 2:22 AM

#882,349

Replying to dborowiec10 (#882,303)

Not from the Meta team but you might want to take a look in the SM and search for "GPU"/"GPUs", they actually did a very nice job describing it (does not answer you question RE region but I thought it might be helpful, e.g. number of GPUs).

dborowiec10 t1_izckl91 wrote on December 8, 2022 at 2:41 AM

#882,458

Replying to pyepyepie (#882,349)

Thanks, that's a good point of reference. Seems like Nvidia V100s (volta)?
Would be interesting to see the total compute time involved.

bourbakai t1_izckyoy wrote on December 8, 2022 at 2:44 AM

#882,477

What do you envision as the eventual end-game or long-term goals for systems like CICERO that are capable of achieving human-level performance in complex strategy games like Diplomacy?

TheFibo1123 t1_izclt7o wrote on December 8, 2022 at 2:51 AM

#882,525

Can you discuss the significance of CICERO's ability to engage in natural language dialog in relation to its planning abilities? How do you see this ability potentially benefiting the development of other planning systems and AI technologies in the future?

[deleted] t1_izcqk57 wrote on December 8, 2022 at 3:29 AM

#882,766

[removed]

OSeady t1_izcsujq wrote on December 8, 2022 at 3:47 AM

#882,894

Replying to JustOneAvailableName (#881,608)

Is this for real?

comictech t1_izct8i0 wrote on December 8, 2022 at 3:51 AM

#882,918

How does one get an internship at META with HCI/learning science background? In addition to gaming, does META plan going into education?

This_Objective7808 t1_izcvmmu wrote on December 8, 2022 at 4:11 AM

#883,019

I'm curious how the number of messages it sends compares to the human players. 5200 seems like a lot for 2 games. It may be that this is similar to the problem with the SAT essay where just writing a longer essay got you a higher score independent of quality. By being agreeable with all the other players, it may have been able to outlast it's competitors.

Either way, this is a great achievement for nlp. I'm excited for how nlp+rl will be used in the coming years.

Sinity t1_izcwah8 wrote on December 8, 2022 at 4:17 AM

#883,056

Replying to OSeady (#882,894)

Yes. Through it's not actually simulating the machine - it's just superficially good at pretending.

cat "asdf" > file.txt

Works well. file.txt is visible if you ls. Then you maybe do cat "qwerasdf" > somedata.dat... and on ls it, again, shows up. But maybe ChatGPT forgot about file.txt and it doesn't show up anymore.

TBF, humans wouldn't necessarily even outperform it on "pretending to run a computer" (not just an algorithm, but actual OS and such).

I think scale would make it way better at keeping track of things well.

levi97zzz t1_izcwd1g wrote on December 8, 2022 at 4:18 AM

#883,061

Does cicero have a theory of mind? I’m doing research on implementing theory of mind module into NLP chatbot, and to my knowledge, theory of mind emerge in Cicero without explicitly implementing any ToM module - is that true or am I missing something obvious?

tylersuard t1_izcwv7g wrote on December 8, 2022 at 4:22 AM

#883,089

I have no questions. Thank you guys for doing what you do. Meta is one of the world's leading AI research companies. So many cool breakthroughs, not to mention making PyTorch.

[deleted] t1_izcyuz0 wrote on December 8, 2022 at 4:40 AM

#883,189

Replying to OSeady (#882,894)

[deleted]

Swolnerman t1_izczafd wrote on December 8, 2022 at 4:43 AM

#883,218

To what extent is it necessary to get a graduate/post-graduate degree to work on the cutting edge of ML such as this? I’ve been involved in ML for a few years now during my undergraduate degree and have been debating whether or not I want to do a graduate degree or go directly into industry and work my way up

blose1 t1_izczrn8 wrote on December 8, 2022 at 4:48 AM

#883,247

Replying to OSeady (#882,894)

I asked ChatGPT about your comment:

>someone asked on reddit "is this for real?" about article with title "Building A > Virtual Machine inside ChatGPT", how would you respond to him ?

A: It's difficult to say without more context. It's possible that the article in question is discussing the use of a large language model, such as GPT-3, to create a virtual machine within a chat environment. However, without being able to view the article and verify its contents, it's impossible to say for sure whether the claim is true or not. In general, it's always a good idea to approach claims with a healthy dose of skepticism and to verify information before accepting it as fact.

>article is indeed about creating a virtual machine inside large language model and simulating running different commands while this model returns accurate responses, specifically using you - ChatGPT, this person is indeed skeptical, knowing now that additional context, how would you respond to that person?

A: It's certainly possible to use a large language model, such as ChatGPT, to simulate running commands within a virtual machine. Language models are able to generate text based on a given input, so they can be used to generate responses to a wide range of prompts. In the case of a virtual machine, the prompts might be commands that a user could enter, and the responses would be the output of those commands. However, it's important to note that these responses are generated by the model and are not the result of the model actually executing the commands. In other words, while the model may be able to generate responses that are similar to the output of a real virtual machine, it is not actually running the commands and therefore is not a true virtual machine.

OSeady t1_izd05fm wrote on December 8, 2022 at 4:51 AM

#883,263

Replying to blose1 (#883,247)

Incredible. Thank you.

[deleted] t1_izd0cyk wrote on December 8, 2022 at 4:53 AM

#883,273

[deleted]

ditlevrisdahl t1_izd4qrp wrote on December 8, 2022 at 5:37 AM

#883,452

What techniques did you use to evaluate that your model was actually learning the game?

I can imagine that the first million of episodes the model just produced ramble. So did you just cross you fingers and hoped for some results later? Or did you see steady increase in performance?

[deleted] t1_izd6h98 wrote on December 8, 2022 at 5:56 AM

#883,515

Replying to [deleted] (#883,273)

[deleted]

Centurion902 t1_izd6r4m wrote on December 8, 2022 at 5:59 AM

#883,530

Replying to [deleted] (#880,746)

Inhuman play would be flagged as untrustworthy and make it difficult for the AI to make alliances in game, thus leading to weaker play overall.

Thorusss t1_izda0mm wrote on December 8, 2022 at 6:37 AM

#883,688

Players have felt that Cicero is way more forgiving (cooperating after a recent betrayal) than human players, when it serves it purpose for the next turn. Is that your observation as well?

Does Cicero have full memory of the whole game and chat, and can e.g. remember a betrayal from many turns ago?

I also understand that it reevaluates all plans each turn. Does that basically mean it does not have/need an internal long term strategy beyond it current optimization of the long term results of the next move?

Thorusss t1_izda1r8 wrote on December 8, 2022 at 6:37 AM

#883,690

remindme! 2 days

RemindMeBot t1_izda3vi wrote on December 8, 2022 at 6:38 AM

#883,693

Replying to Thorusss (#883,690)

I will be messaging you in 2 days on 2022-12-10 06:37:46 UTC to remind you of this link

2 OTHERS CLICKED THIS LINK to send a PM to also be reminded and to reduce spam.

^(Parent commenter can ) ^(delete this message to hide from others.)

^(Info)	^(Custom)	^(Your Reminders)	^(Feedback)

[deleted] t1_izdcgz7 wrote on December 8, 2022 at 7:08 AM

#883,796

Replying to [deleted] (#883,515)

[deleted]

jthat92 t1_izddm7x wrote on December 8, 2022 at 7:23 AM

#883,838

How would I go about getting a position as a Reasearch Engineer without a PhD at a company like Meta? I assume contacts are needed and a referral? I would assume for a Research Scientist Position a PhD would be a must have?

Takithereal t1_izdeqqw wrote on December 8, 2022 at 7:38 AM

#883,877

Was there any behavior of the AI that surprised the world champion? Is there something we could learn in terms of strategy ?

[deleted] t1_izdivg0 wrote on December 8, 2022 at 8:36 AM

#884,002

[deleted]

2Punx2Furious t1_izdj15q wrote on December 8, 2022 at 8:38 AM

#884,007

I have a few questions, feel free to answer any.

Is your end-goal AGI?
Are you working on the alignment problem?
Opinions on transformers and LLMs?
What are your predictions for the field in the next year, next 5 years, and next 10 years?

TheBaxes t1_izdj7ue wrote on December 8, 2022 at 8:40 AM

#884,012

Any tips for working on RL? I'm currently a bachelors working as ML engineer doing stuff related to computer vision and generative models in a consulting company, but would love to work on stuff related to RL or even games.

Would you guys recommend just going for a PhD and try to get a job in some place like Meta AI labs? I don't know how easy it could be to pivot from computer vision to RL without an advanced degree.

[deleted] t1_izdjyj3 wrote on December 8, 2022 at 8:51 AM

#884,030

Replying to Centurion902 (#883,530)

Thanks, this answered my question. I guess the point is to be imperceptible to other humans, not necessarily to an algorithm, which was my confusion. It also makes this result more impressive.

If other humans detect that a player is an AI bot, it may diminish their ability to form alliances through the general lack of trust people have towards AI. As you said.

This work would help towards building human like agents, for which there are lots of motivations for developing.

[deleted] t1_izdkb13 wrote on December 8, 2022 at 8:56 AM

#884,045

Replying to Rybolos (#880,844)

[deleted]

teagrower t1_izdnpk9 wrote on December 8, 2022 at 9:46 AM

#884,149

How do you compare your approach with that of the original Watson demo?
If you're open for collaboration with other institution and companies in the field, who do we contact?

doU2Boo t1_izdoryd wrote on December 8, 2022 at 10:01 AM

#884,184

Does the Agent detect betrayal? If yes, how long did it take to train and what was the Aha moment that led to the revelation??

kayjaykay87 t1_izdptuq wrote on December 8, 2022 at 10:16 AM

#884,210

What kind of environment is Meta for AI research? Given the recent “relatively” tough times at Meta is AI research seen as something that can be cut back on, or is AI research an established tangible benefit to Meta as a for-profit company?

How did you pitch the project to upper management and how difficult was it? What sort of budget did you have?

What do you think of the DARPA-led Diplomacy project that is starting to pick up now? It seems like aren’t going for natural language etc, and are involving multiple independent teams, are you expecting to see any significant developments from them? What‘s the next big challenge for Meta AI Research?

And congrats etc.

[deleted] t1_izdqjo0 wrote on December 8, 2022 at 10:26 AM

#884,225

Replying to bpe9 (#880,941)

[deleted]

Butanium_ t1_izdqwny wrote on December 8, 2022 at 10:31 AM

#884,232

Hi ! How do you plan to improve Cicero enough for longer games ? Also do you think group chats are useful in the game? If yes will your next iteration be trained to use them?

[deleted] t1_izdzvyg wrote on December 8, 2022 at 12:25 PM

#884,594

[removed]

mouldygoldie t1_ize2wup wrote on December 8, 2022 at 12:56 PM

#884,737

Hi Noam,

Just want to say you spoke to my university a couple of weeks ago about your Poker agent which could play in multiplayer (6) Texas Holdem and it was really really great! I'm looking forward to reading more about this and seeing what comes out of MetaAI soon; very cool!

weightloss_coach t1_ize4yh6 wrote on December 8, 2022 at 1:16 PM

#884,883

How soon you think we would be able to learn symbolic rules of a game from text description (say Wikipedia entry) and an infer a symbolic reasoner instead of hard coding it?

toposynthesis t1_ize6hrr wrote on December 8, 2022 at 1:30 PM

#884,991

Imagine you are a young person with good programming knowledge but little machine learning knowledge, how would you start learning to build cool stuff with machine learning?

-ZeroRelevance- t1_ize7jxl wrote on December 8, 2022 at 1:39 PM

#885,047

What games do you plan to tackle with this model next? My first guess would be a game like Mafia or Among Us, since they have some similar principles to diplomacy but with even more focus on trust and deception. I’m interested in hearing your own thoughts though.

[deleted] t1_izeb4c7 wrote on December 8, 2022 at 2:08 PM

#885,237

[removed]

draconicmoniker t1_izedob5 wrote on December 8, 2022 at 2:28 PM

#885,389

Is there a way to log, locate or visualize the contents or knowledge of the neural network at the moment when deception is happening? As the first system to reliably demonstrate AI that can deceive, it would be very informative to build more tools around how to search for and detect it in general.

Smallpaul t1_izehl87 wrote on December 8, 2022 at 2:57 PM

#885,617

Replying to [deleted] (#883,796)

I haven’t played the game but I assume that the role of deception is to allow one to simultaneously join two mutually exclusive alliances at once?

polymorphicprism t1_izesr36 wrote on December 8, 2022 at 4:13 PM

#886,239

Replying to pm_me_your_pay_slips (#880,919)

See this article about a 3x world champion.

> I asked Goff about any major falsehoods or betrayals that helped him in his victories. He paused to think, then said in his soft-spoken way: “Well, there may have been a few deceptive omissions on my part but, no, I didn’t tell a single outright lie the entire tournament.”

rikiiyer t1_izew1lv wrote on December 8, 2022 at 4:35 PM

#886,449

I listened to Noam’s conversation with Lex Friedman the other day and he made the point that the model had to learn human like tendencies in order to work with humans to win at Diplomacy. Do you think it would be possible to use these learned features to somehow teach other models how to act more human-like?

ReginaldIII t1_izf02ey wrote on December 8, 2022 at 5:01 PM

#886,681

> "A strange game. The only winning move is not to play. How about a nice game of chess?"

loranditsum t1_izf4uk6 wrote on December 8, 2022 at 5:31 PM

#886,963

Who in the team has the biggest imposter syndrome

lII1lIIl1IIll1Il11l t1_izf55fa wrote on December 8, 2022 at 5:33 PM

#886,986

How often is Mark Zuckerburg around your group? What's he like?

AnArgumentativeFool t1_izf55tb wrote on December 8, 2022 at 5:33 PM

#886,987

As a diplomacy player who plays a lot online, is there any chance that you plan on testing how the AI would do in a more standard setting (for example 24 hours per turn, instead of the 5 minute games)?

The ability to win in the games it has is an incredible achievement but I am still under the impression there is a long way to go before the machines achieve super human ability in the formats of the game that require much more human connection and communication against the best players. Thanks

MetaAI_Official OP t1_izf9mic wrote on December 8, 2022 at 6:02 PM

#887,192

Replying to hophophop1233 (#880,615)

While CICERO is only capable of playing Diplomacy, the underlying technology is relevant to many real-world applications. We think others will be able to build on this research in a way that might lead things like better AI personal assistants or NPCs in the metaverse.I think the way we integrated strategic reasoning with NLP was novel and has implications for future research.

We've open-sourced all the models and code. We're also making the training data available to researchers who apply through our RFP. Running the full CICERO agent, including the strategic reasoning component, is quite expensive. The raw models by themselves are more manageable though. -NB

MetaAI_Official OP t1_izf9vhi wrote on December 8, 2022 at 6:04 PM

#887,205

Replying to nraw (#881,199)

AI sits at the very heart of work across Meta. We are part of Meta AI's Fundamental AI Research team -- known as FAIR. Exploratory research, open science, and cross-collaboration are foundational to FAIR efforts. Researchers like us have the freedom to pursue pure open science type work and collaborate with the industry and academia.

Research teams also work closely with product teams across Meta. This gives engineers an early view into where the latest in AI is heading and gives researchers an up-close look at how AI is working at scale. This internal collaboration has helped us build a faster research to production pipeline too. -AL

MetaAI_Official OP t1_izfa9bo wrote on December 8, 2022 at 6:06 PM

#887,219

Replying to pimmen89 (#880,864)

I'm not entirely sure if this answers what you were asking, but on the strategic planning side of CICERO, in some sense the fundamental challenge of Diplomacy is that it has a large number of local optima, with no inherent notion of which optimum is better than any other. Because you need to sometimes cooperate with others to do well, the way you need to play depends heavily on the conventions and expectations of other players, and a strategy that is near-optimal in one population of players can be disastrous in another population of players. This is precisely what we observed in earlier work on No-press Diplomacy (Paper). Central to many of our strategic planning techniques in Cicero is the idea of regularization towards human-like behavioral policies, to ensure CICERO's play remains roughly compatible with human play, rather than falling into any of the countless other equilibria that don't. -DW

RapidRewards t1_izfa9pu wrote on December 8, 2022 at 6:06 PM

#887,220

Is there explainability in its strategic reasoning? Have you tried a human in the loop for selecting from top moves?

MetaAI_Official OP t1_izfam7u wrote on December 8, 2022 at 6:09 PM

#887,236

Replying to MetaAI_Official (#887,219)

There were also some places where it looked like it was heading down strategic blind alleys but it kept getting strong results - so for me it also showed that humans can also get stuck in local optimums, especially when groups and their collective "meta-strategies" get involved. -AG

MetaAI_Official OP t1_izfawoe wrote on December 8, 2022 at 6:10 PM

#887,252

Replying to ClayStep (#881,171)

Actually the language model was capable of suggesting good moves to a human player *because* the planning side of CICERO had determined these to be good moves for that player and supplied those moves in an *intent* that it conditioned the language model to talk about. CICERO uses the same planning engine to find moves for itself and to find mutually beneficial moves to suggest to other players. Within the planning side, as described in our paper, we *do* use a finetuned language model to propose possible actions for both Cicero and the other players - this model is trained to predict actions directly, rather than dialogue. This gives a good starting point, but contains many bad moves as well, this is why we run a planning/search algorithm on top. -DW

MetaAI_Official OP t1_izfbfrz wrote on December 8, 2022 at 6:14 PM

#887,283

Replying to pm_me_your_pay_slips (#880,919)

Backstabbing tends to get devalued by CICERO. It has long been my thinking that backstabbing is a poor option in the game and I always feel like I fail when I have to do it, and CICERO seems to agree with me. It gets clearly better results when it is honest and collaborates with allies over the long term. If you forced it to play a pure tactical style game in an environment with communication it would perform poorly, and I think there's a marker there for human players who want to get better as well as some interesting AI ethics ideas that can be explored in future. -AG

MetaAI_Official OP t1_izfc0o8 wrote on December 8, 2022 at 6:18 PM

#887,308

Replying to addition (#881,802)

Meta has no plans to turn CICERO into a product and that was never the goal. This is purely AI research that we have open sourced for the wider research community. I think there are a lot of valuable lessons that the research community can learn from this project. -NB

MetaAI_Official OP t1_izfc2zh wrote on December 8, 2022 at 6:18 PM

#887,315

Replying to MetaAI_Official (#887,308)

I love that Meta open-sourced this. I think that's an important point. I really saw this is a way Meta is giving back to the AI community and the scientific ocmmunity in general and that's one of the reasons I agreed to join this project. I think it is far better for advances like this to come from open academic research than from top secret programs so it is a major ethical tick for Meta that they invest in research like this. -AG

edwardzachary785 t1_izfc3h5 wrote on December 8, 2022 at 6:18 PM

#887,316

Hi Meta Team, I’m curious to know about your process for setting up this tournament to test out Cicero against real players. What sources did you use to find players, and how did you vet players to assure that Cicero was competing against a mix of various skill levels?

MetaAI_Official OP t1_izfcd0s wrote on December 8, 2022 at 6:20 PM

#887,329

Replying to TissueReligion (#881,202)

I started grad school in 2012 and technically defended in 2020, but I actually left the PhD in 2018 and finished up my dissertation while working over the next two years. My grad school research was unusually focused for a PhD student. All my research, starting with my first paper, was focused on answering the question of how to develop an AI that could beat top humans in no-limit poker. After we succeeded in that in 2017, my research shifted more toward generality and scalability.

My original plan was to defend in summer 2019, do an industry research stint for a year, and then start a faculty position in 2020. (1-year deferrals are common in academia these days.) So I applied to universities and industry labs in fall 2018. FAIR gave me an offer and also said that I could start working immediately, even though I told them that I'd be doing faculty interviews for most of spring 2019. That seemed like a strictly better option than staying in grad school and making near-minimum wage, so after considering a few other options I chose to join FAIR immediately.

I ended up liking it so much that I turned down my faculty offers and stayed at FAIR. Once I knew I wasn't going to faculty, there wasn't as much urgency to finishing my PhD. I wanted to include one more major project in my thesis, ReBeL, so I held off on defending until that was done. -NB

MetaAI_Official OP t1_izfcvy5 wrote on December 8, 2022 at 6:23 PM

#887,360

Replying to JimmyTheCrossEyedDog (#881,808)

We disentangle the complexity of the action space from the complexity of the planning algorithm by using a policy proposal network. For each game state we sample a few actions from the network - sets of unit-order pairs - and then do planning only among these actions. Now, in case of continuous actions we will have modify the policy proposal network, but that was already explored for other games with continuous action space such as StarCraft. - AB

PolarCow89 t1_izfd4ja wrote on December 8, 2022 at 6:25 PM

#887,374

Does CICERO ever employ any kingmaking strategies? i.e., if it realizes loss is certain, will it ever shift goals to attempt to make a different power win/lose?

MetaAI_Official OP t1_izfe82v wrote on December 8, 2022 at 6:32 PM

#887,435

Replying to [deleted] (#880,746)

The title of the paper doesn't refer to CICERO being "human-like" necessarily (though it does behave in a fairly human-like way). Instead it refers to the agent achieving a score that's on the level of strong human players.

But also, CICERO is not just trying to be human-like: it’s also trying to model how *other* humans are likely to behave, which is necessary for cooperating with them. In one of our earlier papers we show that even in a dialogue-free version of Diplomacy, an AI that’s trained purely with RL without accounting for human behavior fares quite poorly when playing with humans (Paper). The wider applications we see for this work are all about building smart agents that can cooperate with humans (self-driving cars, AI assistants, …) and for all these systems it’s important to understand how people think and match their expectations (which often means responding in a human-like way ourselves, though not necessarily).

When language is involved, understanding human conventions is even more important. For example, saying “Want to support me into HOL from BEL? Then I’ll be able to help you into PIC in the fall” is likely more effective than the message “Support BEL-HOL” even if both express the same intent. -AL

Specialist-Regret241 t1_izfecx7 wrote on December 8, 2022 at 6:33 PM

#887,446

Well done on Cicero - I played against it three times in August and the only odd thing about it was that didn't engage in the post-game discussion.

Question - how do you think Cicero would fare with more time for discussion? I don't tend to play games with turns that are less than 2 days, and blitz only has 5 minute turns. Or is that something you can't easily test now that the active population of blitz players knows about Cicero? I for one will no longer assume I'm playing against a human when I use webdip in the future.

MetaAI_Official OP t1_izfeehk wrote on December 8, 2022 at 6:33 PM

#887,450

Replying to Liorogamer (#880,922)

One challenge was being able to hold 6 simultaneous conversations at a human speed in the fast-moving "blitz" Diplomacy format, since CICERO has to do a lot of planning and NLP work for each message it sends (see Fig 1 in our paper). We ended up splitting CICERO into "sub-agents" that handle conversations with each other player. CICERO actually ran on 56 GPUs in parallel for our human games (although it can also run on a single GPU in slower time formats). -AL

suchenzang t1_izfeg76 wrote on December 8, 2022 at 6:33 PM

#887,454

How do you quantify the "strategic reasoning" capabilities of the dialogue component in CICERO?

In other words, if you were to finetune an LLM on existing / old gameplay conversations, followed by conditioning on dialogue from a new game via prompts (aka have separate LM from a no-press model) - would such a setup still be able to have a high win-rate simply from the strength of the no-press model?

Specialist-Regret241 t1_izfemmo wrote on December 8, 2022 at 6:35 PM

#887,462

Replying to edwardzachary785 (#887,316)

Generally crap player here who did better than Cicero twice and lost once.

I don't know what they would say but it looked to me like the matches in which they entered Cicero had a pretty wide range of skill, although I don't think there was anyone who diplomacy players consider to be at the top of the game. Hard to say as online diplomacy is so fragmented; Cicero was in a very new very niche variant.

Edited to add that I forgot about some of the players who were involved in the playtesting and design side. They're pretty good.

MetaAI_Official OP t1_izfet1n wrote on December 8, 2022 at 6:36 PM

#887,472

Replying to MetaAI_Official (#887,450)

One of our models trained for several days, and at certain times of the day (but not every day) training speeds would drop dramatically and certain machines became unstable. After a lot of investigation, it turned out that the datacenter cooling system was malfunctioning, and around mid-day on particularly hot days, GPU failure rates would skyrocket. For the rest of the model training run, we had a weather forecast bookmarked to look out for especially hot days! -JG

MetaAI_Official OP t1_izff2pu wrote on December 8, 2022 at 6:37 PM

#887,487

Replying to [deleted] (#883,273)

I really strongly disagree that lying is a positive in Diplomacy. The best players do it as little as possible - it is a game about building trust in an environment where trust is hard to build. I think Diplomacy has a reputation for being about lying because new players think just because they can do it, they must. I am nearly certain that a "CICERO II" wouldn't lie more. -AG

Specialist-Regret241 t1_izff83e wrote on December 8, 2022 at 6:38 PM

#887,499

Replying to Smallpaul (#885,617)

Yeah that's one way of putting it...but it's best to just keep your options open, and deniability at its highest, while maneuvering an ally into a position where you can take advantage of them.

Alliances have no binding impact on the game.

MetaAI_Official OP t1_izffhli wrote on December 8, 2022 at 6:40 PM

#887,516

Replying to thatguydr (#881,478)

Nah CICERO is still invited to the house games -NB

MetaAI_Official OP t1_izffqx4 wrote on December 8, 2022 at 6:42 PM

#887,532

Replying to Rybolos (#880,844)

As we look at the incredible potential of what AI can unlock in the physical and virtual worlds we need to balance that optimism with an appreciation for the risks. These risks can come in many forms whether through unintended uses of new technologies or through bad actors looking to exploit areas of vulnerability. Being thoughtful about research release (through, e.g., special licenses, as you suggest), is one way to help this research move forward while limiting potential negative use cases. There are also many other research areas which I think are promising for bolstering positive use cases and limiting negative ones; to name just a few, improving control over language model outputs, investing in modeling for rapid adaptability and flexibility, discriminating between human and model-generated text, etc. -ED

Specialist-Regret241 t1_izffugi wrote on December 8, 2022 at 6:42 PM

#887,535

Replying to Butanium_ (#884,232)

ooh group chats would be fascinating

MetaAI_Official OP t1_izffy2d wrote on December 8, 2022 at 6:43 PM

#887,541

Replying to xutw21 (#881,222)

From a non-technical point of view, the fact that the human Diplomacy players we worked with (Karthik and Markus) were really excellent players so the model kept being evaluated against the best, rather than accounting for human players sometimes being average instead. Accounting for all levels of play was challenging! -AG

MetaAI_Official OP t1_izfh1t6 wrote on December 8, 2022 at 6:50 PM

#887,603

Replying to NeverStopWondering (#881,689)

We tested the model using self-play frequently before we ever put it in front of humans (outside of our team). One interesting learning was that mistakes that the model makes in self-play games aren't reflective of the mistakes it makes when playing against humans. From a language perspective, in self-play, the model is more prone to "spirals" of degenerate text (as one bad message begets the next, and the model continues to mimic its past language). Moreover, humans reacted differently to mistakes the model made — in human play, a human might question/interrogate the agent after receiving a bad message, while another model is unlikely to do so. This really underscored the importance of playing against humans during development for research progress. -ED

MetaAI_Official OP t1_izfheer wrote on December 8, 2022 at 6:52 PM

#887,617

Replying to NeverStopWondering (#881,689)

From a strategic perspective, it attempts similar things but the results are a little different - which is understandable as it reacts differently. It tends to build more unorthodox alliances just because it doesn't know they're unorthodox. It actually made the self-play games quite fun to watch, although if the point is to compete against humans it is kind of tangential to the key challenges. -AG

MetaAI_Official OP t1_izfhgd0 wrote on December 8, 2022 at 6:53 PM

#887,620

Replying to MetaAI_Official (#887,487)

As Andrew has said, Diplomacy is less about lying and more about trust-building than beginners typically think. Of course, there are times when some amount of lying may be the best strategy. One reason that CICERO did not use deception effectively - and why we abandoned it - is that it wasn't very good at reasoning about the long-term cost of lying, i.e. knowing exactly how much a particular lie would hurt its ability to cooperate with the other player in the future. We're not really interested in building lying AIs, but being able to understand the long-term consequences of one's actions on other people's behavior is an interesting research direction! -AL

MetaAI_Official OP t1_izfhq3k wrote on December 8, 2022 at 6:54 PM

#887,637

Replying to Effective-Dig8734 (#882,050)

The next step is taking the lessons we've learned from CICERO and extending them more broadly to other research domains. We're also hoping that others are able to build on our open-sourced work and will continue to use Diplomacy as a benchmark for research. -NB

MetaAI_Official OP t1_izfhy6x wrote on December 8, 2022 at 6:56 PM

#887,643

Replying to This_Objective7808 (#883,019)

I loved this problem! The average human player sends way too few messages compared to the best human players, so the challenge was how far to push this before it became.... weird. So it wasn't just infinite messaging either. I'll let others answer how that was technically achieved, but this was an underrated challenge to achieving great play. What a great question! -AG

MetaAI_Official OP t1_izfi4f8 wrote on December 8, 2022 at 6:57 PM

#887,654

Replying to MetaAI_Official (#887,643)

CICERO sent/received an average of 292 messages per game (the 5277 is the number of messages it sent over the course of 40 games). This figure was comparable to its human counterparts. As Andrew points out, this was quite an interesting technical problem to tackle — there are real risks to sending too many messages (annoying your allies, + the additional risk of degenerate text spirals), but missing opportunities to collaborate by not sending enough messages can also be devastating. -ED

pimmen89 t1_izfia6x wrote on December 8, 2022 at 6:58 PM

#887,666

Replying to MetaAI_Official (#887,219)

This was a very nice and enlightening answer! Thank you so much! 🙂

Could you give an example of a local optimum that was funny to watch?

MetaAI_Official OP t1_izfj3yl wrote on December 8, 2022 at 7:03 PM

#887,706

Replying to xutw21 (#881,222)

We tried hard in the paper to articulate the important research challenges and how we solved them. At a high level, the big questions were:

RL/planning: What even constitutes a good strategy in games with both competition and cooperation? The theory that undergirds prior successes in games no longer applies
NLP: How can we maintain dialogues that remain coherent and grounded over very long interactions
Joint: How do we make the agent speak and act in a “unified” way? I.e. how does dialogue inform actions and planning inform dialogue so we can use dialogue intentionally to achieve goals?

One practical challenge we faced was how to measure progress during CICERO’s development. At first we tried comparing different agents by playing them against each other, but we found that good performance against other agents didn’t correlate well with how well it would play with humans, especially when language is involved! We ended up developing a whole spectrum of evaluation approaches, including A/B testing specific components of the dialogue, collaborating with three top Diplomacy players (Andrew Goff, Markus Zijlstra, and Karthik Konath) to play with CICERO and annotate its messages and moves in self-play games, and looking at the performance of CICERO against diverse populations of agents. -AL

MetaAI_Official OP t1_izfjgik wrote on December 8, 2022 at 7:06 PM

#887,722

Replying to pyepyepie (#881,472)

Figuring out how to get strong control over the language model by grounding in "intents"/plans was one of the major challenges of this work. Fig. 4 in the paper shows we achieved relatively strong control in this sense: prior to any filters, ~93% of messages generated by CICERO were consistent with intents and ~87% were consistent with the game state. As you note, however, the model is not perfect, and we relied on a suite of classifiers to help filter additional mistakes. Many of the mistakes CICERO made were relative to information that was *not* directly represented in its input (and thus required additional reasoning steps), e.g., reasoning further-into-the-future states or counterfactual past states, discussing plans for third parties, etc. We could have considered grounding CICERO in a richer representation of "intents" (e.g., including plans for third parties) or of the game state (e.g., explicitly representing past states), but in practice we found that (i) richer intents would be harder to annotate/select and often take the language model out of distribution and (ii) we had to balance the trade off between richer game state representation with the dialogue history representation. Exploring ways to get stronger control/improve the reasoning capabilities of language models is an interesting future direction. -ED

MetaAI_Official OP t1_izfk9ug wrote on December 8, 2022 at 7:11 PM

#887,755

Replying to Roger_M8 (#881,936)

In 2019 we had just finished up Pluribus and were discussing what to pursue next. We saw the incredible breakthroughs happening across the field, like GPT-2, AlphaStar, and OpenAI Five, and knew that we needed to be ambitious with our next goal because the field was advancing quickly. We were discussing what would be the hardest game to make an AI for and landed on Diplomacy due to its integration of natural language and strategy. We thought it could take 10 years to fully address, but we were okay with that because historically that kind of research timeframe had been the norm. Obviously things worked out better than we expected though.

Our long-term goal was always the full natural language game of Diplomacy but we tried to break the project down into smaller milestones that we could tackle along the way. That led to our papers on human-level no-press Diplomacy, no-press Diplomacy from scratch, better modeling of humans in no-press Diplomacy, and expert-level no-press Diplomacy. -NB

MetaAI_Official OP t1_izfkmc4 wrote on December 8, 2022 at 7:13 PM

#887,771

Replying to TheFibo1123 (#882,525)

Re: Dialogue-related challenges: Moving from the "no press" setting (without negotiation) to the "full press" setting presented a host of challenges at the intersection of natural language processing and strategic reasoning. From a language perspective, playing Diplomacy requires engaging in lengthy and complex conversations with six different parties simultaneously. Messages the agent sends needed to be grounded in both the game state as well as the long, dialogue histories. In order to actually win the game, the agent must not only mimic human-like conversation, but it must also use language as an *intentional tool* to engage in negotiations and achieve goals. On the flip side, it also requires *understanding* these complex conversations in order to plan and take appropriate actions. Consider: if the agents actions did not reflect its conversations/agreements, players may not want to cooperate with it, and at the same time, it must take into account that other players might not be honest when coordinating/negotiating to plans.

Re: AI technologies in the future: advancements in this space have many potential applications and will hopefully improve human-AI communication in general to get closer to the way people communicate with each other. -ED

Roger_M8 t1_izfl3w0 wrote on December 8, 2022 at 7:17 PM

#887,796

Replying to MetaAI_Official (#887,755)

Thank you for your answer and congrats on the amazing work!

MetaAI_Official OP t1_izfl6g3 wrote on December 8, 2022 at 7:17 PM

#887,802

Replying to levi97zzz (#883,061)

CICERO reasons about the beliefs, goals, and intentions of the other players. Whether that counts as "theory of mind" depends on the definition. This reasoning is partly implicit through the output of the policy network based on the conversations and sequence of actions, and part of it is explicit through the strategic reasoning algorithm. -NB

MetaAI_Official OP t1_izfldg5 wrote on December 8, 2022 at 7:18 PM

#887,813

Replying to ditlevrisdahl (#883,452)

Early on, we primarily evaluated the model using self-play, having team members play against it, and by building small test sets to evaluate specific behaviors. In the last year, we started evaluating the model by putting it in live games against humans (with another human in the loop to review its outgoing messages and intervene if necessary). We quickly learned that the mistakes the model makes in self-play weren't necessarily reflective of its behaviors in human play. Playing against humans became *super* important for developing our research agenda! -ED

MetaAI_Official OP t1_izflfk2 wrote on December 8, 2022 at 7:19 PM

#887,815

Replying to Takithereal (#883,877)

I think the speed that it went from playing no communication games to full communication games was the biggest surprise - not just the natural language but the adaptation of strategy and tactics. I expected it to really struggle to climb out of what it had learned from that style of game, but it did so pretty quickly, which is probably down to the technical expertise of the team. I guess beyond that, the AI plays some approaches that upset the inherited wisdom of the diplomacy playing group. I'm totally revisiting some opening lines for example. In terms of what we can learn - the strategic ideas the emerge seem to be very much aligned with high level human players. Patience, collaboration, improving position rather than brute force tactical tricks... at that level of abstraction it plays very similarly to a good human player. -AG

MetaAI_Official OP t1_izfljno wrote on December 8, 2022 at 7:19 PM

#887,820

Replying to RapidRewards (#887,220)

It takes significant effort, but yes, on the strategic planning side it is often possible to work out why CICERO came up with particular moves or intents. We often did this during development when debugging. You can look at the moves considered by the search for it and its opponents and see what values those achieved in the iterations within the search, and see how the equilibrium evolved in response to those values, you can look at the initial policy prior probabilities, and so on. Not entirely unlike walking through a debug log of how a chess engine explored a tree of possible moves and why it came up with the value it did. In fact, generally with systems that do explicit planning rather than simply running a giant opaque model end-to-end, it's usually possible to reverse-engineer "why" the system is doing something, although it may take a lot of time and effort per position. We haven't tried a human in the loop for choosing moves though. -DW

MetaAI_Official OP t1_izfll3c wrote on December 8, 2022 at 7:20 PM

#887,823

Replying to Swolnerman (#883,218)

As someone without a PhD, I will say I definitely don't think it's necessary to have a graduate degree to work at the cutting edge of ML. Our team contains people with a mix of educational backgrounds working on all aspects of the projects, and the majority of the team do not have PhDs. I don't think there's an optimal choice for everyone, it probably depends on how you learn best and what type of problem you want to work on, but there's certainly a lot of great research being done by people without PhDs within industry! -AL

Aesthetic_tissue_box t1_izflpne wrote on December 8, 2022 at 7:21 PM

#887,838

I feel like human diplomacy is conducive to some emotionally driven plays (especially when a player knows they are being eliminated) which are rarely optimal and more about satisfying some agenda. For example a particularly egregious backstab might result in a player focusing down their betrayer at the expense of their own survival and success.

How does Cicero deal with these kind of situations? is it capable of understanding that vendettas might be pursued over the optimum play?

MetaAI_Official OP t1_izflw2v wrote on December 8, 2022 at 7:22 PM

#887,848

Replying to TheBaxes (#884,012)

There's quite a few open-source Reinforcement Learning challenges that you can explore with modest amounts of compute in order to build some experience training RL models, for example the Nethack Learning Environment, Atari, Minigrid, etc. For me personally, I had only worked in NLP / dialogue for years but got into RL by implementing Random Network Distillation models for NetHack. It's a fun area that definitely has its own unique challenges vs other domains. -AM

MetaAI_Official OP t1_izfm4rw wrote on December 8, 2022 at 7:23 PM

#887,858

Replying to loranditsum (#886,963)

I don’t know about “biggest” :p but as someone without a graduate degree working in AI research, I’ve definitely felt imposter syndrome at times. One of the amazing things about working with large teams of research experts is that people bring extremely deep and diverse knowledge. Just on our team there are experts in NLP, reinforcement learning, game theory, systems engineering, and Diplomacy itself. When people are specialized in this way, the total knowledge on the team is much more than the knowledge of any individual, which is excellent for the team but was daunting for me at first! -JG

MetaAI_Official OP t1_izfm7lf wrote on December 8, 2022 at 7:24 PM

#887,864

Replying to MetaAI_Official (#887,820)

We did also get good human players to review the games and look for really good or bad moves, but that was very early in the development process - CICERO generated good moves and it would be counter-productive to stop it making what it thinks is the best moves. For example, at the tournament I was at in Bangkok a few weeks ago I thought "what would CICERO do?" and then I did a different set of moves - but what CICERO would have done was right! -AG

MetaAI_Official OP t1_izfmiav wrote on December 8, 2022 at 7:26 PM

#887,881

Replying to Specialist-Regret241 (#887,446)

As noted in an answer to a previous question: we were originally targeting 24hr-turn games, but ended up pivoting to 5min-turn games due to the inability to gather a sufficient number of samples in the 24hr-turn format (as playing a single game can sometimes take months)! Playing 24hr-turn games would indeed pose additional challenges from a language generation perspective — while human players tend to send a similar number of messages in each format, messages in 24hr turns tend to be significantly longer (and likely more complex). Moreover, human players would have more time to interrogate mistakes from the bot, which could potentially lead to the agent making further mistakes. -ED

CauliflowerNo4558 t1_izfmkmf wrote on December 8, 2022 at 7:26 PM

#887,885

Obviously artificial is the easy one here. Please give a universally objective definition of intelligent in the form of an aphorism that is verifiable with empirical scientific based evidence. If you are incapable of doing this what gives you the right to use the word intelligent in describing your product?

MetaAI_Official OP t1_izfmvc5 wrote on December 8, 2022 at 7:28 PM

#887,899

Replying to mouldygoldie (#884,737)

Thanks! Glad you enjoyed it! -NB

MetaAI_Official OP t1_izfmvrj wrote on December 8, 2022 at 7:28 PM

#887,901

Replying to Butanium_ (#884,232)

Actually our agent can play longer games, and much of our earlier testing (where we had to manually approve all outgoing messages) was on 24 hour games instead of the 5 minute games that we report on in the paper. The agent is overall a bit more effective in shorter time controls but the agent was in fact scoring quite well in longer time formats as well. However, these games take weeks to complete, and ultimately we decided that it would take too long to play enough games for statistical significance, hence the focus on shorter games. -JG

Liorogamer t1_izfn24a wrote on December 8, 2022 at 7:29 PM

#887,904

Replying to MetaAI_Official (#887,472)

Love this story!! 😂 You have great investigation skills

MetaAI_Official OP t1_izfn5kb wrote on December 8, 2022 at 7:30 PM

#887,908

Replying to Thorusss (#883,688)

Re: memory of the whole game/chat — in terms of the dialogue, due to memory constraints, both our dialogue models and dialogue-conditional action models see a fixed context window (typically, only a few turns/phases worth of dialogue, depending on how many messages were sent in a given turn).

Re: betrayal/forgiveness — Many humans fall into the trap of trying to make another player lose out of ""revenge"", even at the cost of making bad strategic decisions relative to their own gameplay. CICERO is designed to take actions that are best for itself. - ED

MetaAI_Official OP t1_izfn6je wrote on December 8, 2022 at 7:30 PM

#887,911

Replying to bpe9 (#880,941)

I've talked to a few folks about whether this kind of research is applicable to financial markets and the short answer I've gotten is "not directly". I think there are many more promising directions to take this research, like personal assistants and modeling drivers on roads. -NB

MetaAI_Official OP t1_izfn939 wrote on December 8, 2022 at 7:31 PM

#887,914

Replying to MetaAI_Official (#887,908)

Emily is spot on with the revenge point. It is a very understandable human emotion but it doesn't help you win games of Diplomacy. CICERO doesn't get tilted - another thing it shares with strong human players. -AG

MetaAI_Official OP t1_izfnivp wrote on December 8, 2022 at 7:32 PM

#887,928

Replying to doU2Boo (#884,184)

Our final agent does not explicitly try to detect deception. We do have models that predict the actions that people will play based on the board state and message history, and these models may implicitly detect betrayal by predicting actions that don't correspond with the message history. CICERO does have a model that tries to detect whether its *own* messages don't correspond to its intended action, and it will filter out the most egregious cases of that. -JG

MetaAI_Official OP t1_izfnjj7 wrote on December 8, 2022 at 7:33 PM

#887,929

Replying to ClayStep (#881,171)

A related question is "can CICERO take suggestions from other players?" to which the answer is "Yes!". CICERO uses its models to generate a list of "plausible moves" that it reasons over, but if someone suggests an unexpected move to CICERO, it will evaluate that move in its planning and play it if it's a good idea. -AL

MetaAI_Official OP t1_izfnoh3 wrote on December 8, 2022 at 7:34 PM

#887,935

Replying to pm_me_your_pay_slips (#880,919)

CICERO's dialogue model is trained to generate messages that honestly correspond to the intents (actions for itself and for its dialogue partner) that are inputs to the model, and CICERO always inputs the action it actually intends to take. That said, that doesn't mean CICERO will never attack any particular player. If it chooses to do so, it might strategically withhold details of its plans from that player. -NB

Aggravating-Act-1092 t1_izfnwu9 wrote on December 8, 2022 at 7:35 PM

#887,953

How can I play against it?

MetaAI_Official OP t1_izfnzva wrote on December 8, 2022 at 7:36 PM

#887,955

Replying to weightloss_coach (#884,883)

Great question! Back when we initiated Diplomacy, I hypothesized that "an agent that can read the rules of any game and play it at an intermediate level" would be the next challenge problem. There's been so much progress on the language modeling side that I think a system like this is within reach within the next 2-3 years if substantial effort was devoted to it. We're starting to see similar task-generality in large language models on real-world tasks, although constructing a symbolic representation for planning out of a text description is still an open research question! -AL

MetaAI_Official OP t1_izfo0f7 wrote on December 8, 2022 at 7:36 PM

#887,956

Replying to Takithereal (#883,877)

The speed at which we managed to progress from no communication to full natural language Diplomacy also surprised the research team. When we started, the idea of an AI agent that could master no-press Diplomacy seemed like a multi-year effort, and the idea of an AI agent that could play full-scale Diplomacy in natural language seemed like science fiction. We thought it might take 10 years to reach this point. -NB

MetaAI_Official OP t1_izfo503 wrote on December 8, 2022 at 7:37 PM

#887,964

Replying to draconicmoniker (#885,389)

While many players do lie in the game, the best players do so very infrequently because it destroys the trust they’ve built with other players. Our agent generates plans for itself as well as for other players that could benefit them, and it tries to have discussions based on those plans. It doesn’t always follow through with what it previously discussed with a player because it may change its mind about what moves to make, but it does not intentionally lie in an effort to mislead opponents. We're excited about the opportunities for studying problems like this that Diplomacy as an environment could provide for researchers interested in exploring this question; in fact, some researchers have already studied human deception in Diplomacy: https://vene.ro/betrayal/niculae15betrayal.pdf and https://www.cs.cornell.edu/~cristian/Deception_in_conversations_files/deception-conversational-dataset.pdf. -AM

MetaAI_Official OP t1_izfod3v wrote on December 8, 2022 at 7:38 PM

#887,971

Replying to AnArgumentativeFool (#886,987)

We were originally targeting 24hr-turn games, but ended up pivoting to 5min-turn games due to the inability to gather a sufficient number of samples in the 24hr-turn format (as playing a single game can sometimes take months)! Playing 24hr-turn games would indeed pose additional challenges from a language generation perspective — while human players tend to send a similar number of messages in each format, messages in 24hr turns tend to be signficantly longer (and likely more complex). Moreover, human players would have more time to interrogate mistakes from the bot, which could potentially lead to the agent making further mistakes. -ED

MetaAI_Official OP t1_izfoej8 wrote on December 8, 2022 at 7:38 PM

#887,975

Replying to rikiiyer (#886,449)

The learned features are specific to the game of the Diplomacy because the data we used is specific to the game of Diplomacy, but the ideas can be transferred to other domains. Rather than just learning Diplomacy by playing against itself, the AI used a model trained on human games both to guide exploration during training (sampling moves from this model during self-play) as well as during planning (consider what actions humans are likely to take). It's not always obvious exactly how to apply this, but we think there's exciting opportunities for research in this space! -AM

MetaAI_Official OP t1_izfolik wrote on December 8, 2022 at 7:39 PM

#887,986

Replying to loranditsum (#886,963)

[AG] Me. The whole team is just next level and every day I was working with them I was sponging up ideas and knowledge. It's just so great being in a room with people who are so good at what they do. While I'm obviously OK at Diplomacy, the AI aspects and how the team attacked problems just blew my mind.

MetaAI_Official OP t1_izfoxo4 wrote on December 8, 2022 at 7:42 PM

#888,014

Replying to toposynthesis (#884,991)

For me personally, I had no practical machine learning experience prior to 2017, although I did have experience in engineering, and with statistics and working with data. I often had personal programming projects going which I worked on in the weekends and evenings. But anyways, among these projects I picked an intro project that I thought would be fun (human move prediction with deep neural nets in computer Go), started looking up tutorials, academic papers, ML libraries and APIs, and that was the start of it. Pick something you're interested in, and dive in! -DW

MetaAI_Official OP t1_izfp3hc wrote on December 8, 2022 at 7:43 PM

#888,022

Replying to Aggravating-Act-1092 (#887,953)

Since this is a research effort, we don't have plans to host CICERO for public availability. However, we have open-sourced both the model files and code, which means you could host CICERO yourself on a private instance of webDiplomacy.net (also open sourced here). More details can be found here. -CF

MetaAI_Official OP t1_izfp4ei wrote on December 8, 2022 at 7:43 PM

#888,028

Replying to -ZeroRelevance- (#885,047)

I think you could take a similar approach to Mafia or Among Us and do well. In fact, Mafia would be easier because it's still a two-team zero-sum game. We chose Diplomacy specifically because we thought it would be the hardest game to make an AI for and the most "real-world" game due to its natural language component. Now that we've achieved human-level performance in it, we're hoping to move beyond recreational games toward more real-world domains. -NB

MetaAI_Official OP t1_izfpaa6 wrote on December 8, 2022 at 7:44 PM

#888,049

Replying to edwardzachary785 (#887,316)

We joined a league designed by members of the active online Diplomacy community. The league included new players as well as more experienced players who have performed well in other Diplomacy tournaments. -AM

MetaAI_Official OP t1_izfpeey wrote on December 8, 2022 at 7:45 PM

#888,057

Replying to suchenzang (#887,454)

Controlling the dialogue model via intents/plans was critical to this research. Interfacing with the strategic reasoning engine in this way relieved the language model of most of the responsibility of learning strategy and even which moves are legal. As shown in Fig. 4 in the paper, using an LM without this conditioning results in messages that are (1) inconsistent with the agent's plans, (2) inconsistent with the game state, and (3) lower quality overall. We did not conduct human experiments with an LM like this or a dialogue-free agent, as such behavior is likely to be frustrating to people (who would be unlikely then to cooperate with the agent) and quickly detected as an AI. -ED

MetaAI_Official OP t1_izfpggs wrote on December 8, 2022 at 7:45 PM

#888,059

Replying to PolarCow89 (#887,374)

CICERO always tries to maximize its own score. However, there is a regularizer that penalizes it for deviating from a human-like policy. When all actions have the same expected value (e.g., when it's guaranteed to lose no matter what) then it will just try to play in a human-like way, which may involve retaliating against those that attacked it. -NB

MetaAI_Official OP t1_izfpjl9 wrote on December 8, 2022 at 7:46 PM

#888,061

Replying to Aesthetic_tissue_box (#887,838)

One of the key challenges of Diplomacy is modeling how people might respond to your actions. We found that approaches used in prior game AI breakthroughs like Go and poker that relied purely on self-play were not able to anticipate "human" behaviors like retaliation. For that reason, a big contribution of our research is developing a way to incorporate human data into self-play, which allows us to find strong policies that also understand how people approach the game. -NB

alach11 t1_izfpn3d wrote on December 8, 2022 at 7:46 PM

#888,066

Replying to polymorphicprism (#886,239)

With Diplomacy tournaments isn’t there also a bit of iterative game theory? If a top player develops a reputation for outright deception, that can hurt them in future games when competitors trust them less.

MetaAI_Official OP t1_izfpw9x wrote on December 8, 2022 at 7:48 PM

#888,079

Replying to toposynthesis (#884,991)

JG: There's never been a better time to get into machine learning, with so many amazing open source projects being released, amazing blog and youtube tutorials, and communities of people trying to learn together. Whether you're interested in audio, image generation, game AI, or anything else, I'd recommend you clone a popular open source repo, play around with it for a while, and then see if you can make a small modification!

MetaAI_Official OP t1_izfq18c wrote on December 8, 2022 at 7:49 PM

#888,084

Replying to MetaAI_Official (#888,061)

As someone who isn't an AI specialist, this research was a fascinating read. Even for people not in the field this problem is important and if you get the chance it is worth reading! -AG

MetaAI_Official OP t1_izfq2c8 wrote on December 8, 2022 at 7:49 PM

#888,087

Replying to MetaAI_Official (#887,964)

[Goff] Two thoughts on this:
Seeing under the hood like this was fascinating and seeing how the model responded to the messages human players sent was great. That is more about detecting when people lie than the other way around though.

On the actual question you asked Alex is spot on that CICERO only ever ""lied"" by accident - you could see when it sent messages it meant them, then it genuinely changed it's plan later.

MetaAI_Official OP t1_izfq7yw wrote on December 8, 2022 at 7:50 PM

#888,095

Replying to Specialist-Regret241 (#887,446)

Regarding the post-game kibbitzing, we discussed this a few times, but every solution felt like we'd be faking it. For example, we could have put a human in the loop here but.... why? In the end we picked the most honest approach we could when dealing with the community, which was an ethical consideration that underpinned the whole project I think. -AG

MetaAI_Official OP t1_izfq8y9 wrote on December 8, 2022 at 7:50 PM

#888,097

Replying to MetaAI_Official (#887,971)

[Goff] There's also some interesting "anti-weirdness" steps that the team worked on that would need to be put in place - an AI that responds to messages within five minutes 24 hours a day would not feel right at all. I think the most intense timeframe is probably 15 to 30 minutes, as then you will need longer, more complex communications but also the rapid tactical back-and-forth - that pivot would be a cool challenge.

MetaAI_Official OP t1_izfqers wrote on December 8, 2022 at 7:51 PM

#888,108

Replying to MetaAI_Official (#888,049)

Markus was instrumental in organising this - he's deeply connected to the online diplomacy community and his expertise and care for the people involved was pretty critical here. Both from a logistics point of view of getting people to play lots of games but also from making sure there was a good balance. I think from personal feedback I've received it was also a really fun event for the people who participated, so hats off to Markus for all his work on that. -AG

Thorusss t1_izfrw0p wrote on December 8, 2022 at 8:01 PM

#888,181

Replying to MetaAI_Official (#887,487)

Some answer said that the chat history is not preserved beyond a certain length. Does Cicero track past cooperation/betrayal from other players somewhere else?

Thorusss t1_izfsiut wrote on December 8, 2022 at 8:05 PM

#888,225

Replying to MetaAI_Official (#887,620)

>We're not really interested in building lying AIs

Why? Child psychology sees lies as an important development step in the theory of mind - the insight that knowledge is not universal.

In real world applications, AI might encounter lies. Do you think these systems can be deal with that as good, when they are not themselves capable of it? E.g. for planning, you have the model the other side, how do you model lying successfully, when you cannot lie?

[deleted] t1_izfsric wrote on December 8, 2022 at 8:06 PM

#888,241

Replying to alach11 (#888,066)

[deleted]

Takithereal t1_izftv6e wrote on December 8, 2022 at 8:13 PM

#888,308

Replying to MetaAI_Official (#887,815)

Amazing , thank you! In dota 2's open ai really changed how mid position is played nowadays. Really insightful

Thorusss t1_izfuj6u wrote on December 8, 2022 at 8:18 PM

#888,339

Replying to MetaAI_Official (#887,219)

>Central to many of our strategic planning techniques in Cicero is the idea of regularization towards human-like behavioral policies, to ensure CICERO's play remains roughly compatible with human play

That implies there could be more optimal strategies even with alliances with human players? Is there interest in exploring this, and evolving the strategies beyond what humans have found so far, as it has happened with chess and go? See where and Cicero2 could move the Metagame to?

ClayStep t1_izfv6fi wrote on December 8, 2022 at 8:22 PM

#888,376

Replying to MetaAI_Official (#887,252)

Ah this was my misunderstanding then - I did not realize the language model was conditioned on intent (it makes perfect sense that it is). Thanks for the clarification!

ba7rii t1_izfvg5x wrote on December 8, 2022 at 8:23 PM

#888,385

Thanks for the AMA! I’m interested in the infra side of things so, what does your ML infrastructure look like? What are the infra related tools do you use throughout the ML lifecycle? MLFlow? Any other tools?

Error40404 t1_izfzh4l wrote on December 8, 2022 at 8:49 PM

#888,582

Did you develop your own machine learning algorithm (like linear regression, decision tree etc.) or have you accomplished this by just utilizing the aforementioned existing algorithms?

shanereid1 t1_izg2dq6 wrote on December 8, 2022 at 9:08 PM

#888,737

What do you guys think is the most difficult game to solve using RL?

-ZeroRelevance- t1_izg9ott wrote on December 8, 2022 at 9:57 PM

#889,111

Replying to MetaAI_Official (#888,028)

I see, thanks for the reply

pyepyepie t1_izgms77 wrote on December 8, 2022 at 11:29 PM

#889,631

Replying to MetaAI_Official (#887,722)

Interesting. I was completely surprised by the results (I honestly thought Diplomacy will take 10 years) - it's a great demo of how to utilize large language models without messing up :) Congrats.

[deleted] t1_iziv8nz wrote on December 9, 2022 at 1:01 PM

#892,669

Replying to [deleted] (#888,241)

[deleted]

Swolnerman t1_izmyk1f wrote on December 10, 2022 at 8:12 AM

#898,678

Replying to MetaAI_Official (#887,823)

That’s really encouraging to hear! Thank you for the response!

xtdram t1_izn5gnu wrote on December 10, 2022 at 9:50 AM

#898,871

are you planning to release nllb version 2? or do you have other project that is superior to current nllb?

LockUp111 t1_j07f8co wrote on December 14, 2022 at 4:47 PM

#929,336

If you had to learn machine learning all over again, what would your roadmap look like?

[deleted] t1_j0y7lgv wrote on December 20, 2022 at 7:57 AM

#980,093

[removed]

[deleted] t1_j19rwnk wrote on December 22, 2022 at 6:40 PM

#1,011,300

[deleted]

PropertyRemote2332 t1_j1mdhmg wrote on December 25, 2022 at 4:14 PM

#1,056,357

Any chance y’all will release a single player game? I can’t find anyone nerdy enough to play this game with me.

link0007 t1_j1n8yer wrote on December 25, 2022 at 8:22 PM

#1,060,188

Replying to MetaAI_Official (#887,329)

What's it like doing a PhD during these years of incredibly rapid AI development? I would imagine it must be hard keeping up with the pace of change, or even just feeling secure in your work not being obsolete/outdated before it's even published!

link0007 t1_j1n9d0t wrote on December 25, 2022 at 8:25 PM

#1,060,226

Does CICERO reflect on its own actions or intentions? Or would you say it has the capacity for self-reflection?

Comments