How do you quantify the "strategic reasoning" capabilities of the dialogue component in CICERO?
In other words, if you were to finetune an LLM on existing / old gameplay conversations, followed by conditioning on dialogue from a new game via prompts (aka have separate LM from a no-press model) - would such a setup still be able to have a high win-rate simply from the strength of the no-press model?
suchenzang t1_izfeg76 wrote
Reply to [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official
How do you quantify the "strategic reasoning" capabilities of the dialogue component in CICERO?
In other words, if you were to finetune an LLM on existing / old gameplay conversations, followed by conditioning on dialogue from a new game via prompts (aka have separate LM from a no-press model) - would such a setup still be able to have a high win-rate simply from the strength of the no-press model?