Viewing a single comment thread. View all comments

velcher t1_ixj98ee wrote

Great results! Some feedback:

  • I'm somewhat unsatisfied with the amount of human engineering / annotation pipelines that went into the agent. Importantly the "intention" mechanisms, which seem to be a key part of making the dialogue -> planning part tractable.
  • This annoyance somewhat extends to the "message filtering mechanisms" to prevent non-sensical, incoherent messages, as this seems more of a hack. Really, the agent should learn to converse from the objective of being an optimal player (amongst other humans). Because if it starts speaking gibberish, then other human players can tell it is an AI. This would most likely be a bad outcome for the agent (unless the humans are blue-pilled).
  • From what I gather, it seems like it is only trained on "truthful" subset of the dialogue data, which means the agent cannot lie. Deceit seems pretty important for winning Diplomacy.
  • The sections on planning are not easy to understand concretely, specifically "Dialogue-conditional planning" and "Self-play reinforcement learning for improved value estimation". The authors seem to paraphrase the math and logic in words and omit equations to keep it high level, but this just makes everything more vague. Luckily, the supplemental seems to have the details.
  • Thanks for publishing the code. This is very important for the research community. I hope FAIR continues to do this.

Also, the PDF from science.org is terrible. I can't even highlight lines with my Mac's preview app. Please fix that if you get a chance!

4