pyepyepie t1_izbzd9r wrote
I feel like the agent was implemented incredibly well, however, the grounding and "information selection" of the language model was not "clean" since it used classifiers to filter messages. Since the Diplomacy team is extremely competent, I wonder if you had put efforts regarding grounding better (in a general context) and if it's in a future plan, as I feel like it's very important for the community (arguably one of the most important problems in NLP).
edit: I know that the language model was conditioned on in-game orders, etc., but I wonder if you intend to work on novel algorithms for it in the future.
MetaAI_Official OP t1_izfjgik wrote
Figuring out how to get strong control over the language model by grounding in "intents"/plans was one of the major challenges of this work. Fig. 4 in the paper shows we achieved relatively strong control in this sense: prior to any filters, ~93% of messages generated by CICERO were consistent with intents and ~87% were consistent with the game state. As you note, however, the model is not perfect, and we relied on a suite of classifiers to help filter additional mistakes. Many of the mistakes CICERO made were relative to information that was *not* directly represented in its input (and thus required additional reasoning steps), e.g., reasoning further-into-the-future states or counterfactual past states, discussing plans for third parties, etc. We could have considered grounding CICERO in a richer representation of "intents" (e.g., including plans for third parties) or of the game state (e.g., explicitly representing past states), but in practice we found that (i) richer intents would be harder to annotate/select and often take the language model out of distribution and (ii) we had to balance the trade off between richer game state representation with the dialogue history representation. Exploring ways to get stronger control/improve the reasoning capabilities of language models is an interesting future direction. -ED
pyepyepie t1_izgms77 wrote
Interesting. I was completely surprised by the results (I honestly thought Diplomacy will take 10 years) - it's a great demo of how to utilize large language models without messing up :) Congrats.
Viewing a single comment thread. View all comments