suchenzang

suchenzang t1_izfeg76 wrote

How do you quantify the "strategic reasoning" capabilities of the dialogue component in CICERO?

In other words, if you were to finetune an LLM on existing / old gameplay conversations, followed by conditioning on dialogue from a new game via prompts (aka have separate LM from a no-press model) - would such a setup still be able to have a high win-rate simply from the strength of the no-press model?

1