Viewing a single comment thread. View all comments

suchenzang t1_izfeg76 wrote

How do you quantify the "strategic reasoning" capabilities of the dialogue component in CICERO?

In other words, if you were to finetune an LLM on existing / old gameplay conversations, followed by conditioning on dialogue from a new game via prompts (aka have separate LM from a no-press model) - would such a setup still be able to have a high win-rate simply from the strength of the no-press model?

1

MetaAI_Official OP t1_izfpeey wrote

Controlling the dialogue model via intents/plans was critical to this research. Interfacing with the strategic reasoning engine in this way relieved the language model of most of the responsibility of learning strategy and even which moves are legal. As shown in Fig. 4 in the paper, using an LM without this conditioning results in messages that are (1) inconsistent with the agent's plans, (2) inconsistent with the game state, and (3) lower quality overall. We did not conduct human experiments with an LM like this or a dialogue-free agent, as such behavior is likely to be frustrating to people (who would be unlikely then to cooperate with the agent) and quickly detected as an AI. -ED

2