Viewing a single comment thread. View all comments

ClayStep t1_izbsdur wrote

I was at your neurips talk. I note that the language model (conditioned on world state) had the ability to suggest moves to a human player, which the human player found to be good moves.

Could the same model be used to suggest moves for the agent? What are the limitations?

16

MetaAI_Official OP t1_izfawoe wrote

Actually the language model was capable of suggesting good moves to a human player *because* the planning side of CICERO had determined these to be good moves for that player and supplied those moves in an *intent* that it conditioned the language model to talk about. CICERO uses the same planning engine to find moves for itself and to find mutually beneficial moves to suggest to other players. Within the planning side, as described in our paper, we *do* use a finetuned language model to propose possible actions for both Cicero and the other players - this model is trained to predict actions directly, rather than dialogue. This gives a good starting point, but contains many bad moves as well, this is why we run a planning/search algorithm on top. -DW

9

ClayStep t1_izfv6fi wrote

Ah this was my misunderstanding then - I did not realize the language model was conditioned on intent (it makes perfect sense that it is). Thanks for the clarification!

3

MetaAI_Official OP t1_izfnjj7 wrote

A related question is "can CICERO take suggestions from other players?" to which the answer is "Yes!". CICERO uses its models to generate a list of "plausible moves" that it reasons over, but if someone suggests an unexpected move to CICERO, it will evaluate that move in its planning and play it if it's a good idea. -AL

4