MetaAI_Official OP t1_izfldg5 wrote on December 8, 2022 at 7:18 PM

Reply to comment by ditlevrisdahl in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

Early on, we primarily evaluated the model using self-play, having team members play against it, and by building small test sets to evaluate specific behaviors. In the last year, we started evaluating the model by putting it in live games against humans (with another human in the loop to review its outgoing messages and intervene if necessary). We quickly learned that the mistakes the model makes in self-play weren't necessarily reflective of its behaviors in human play. Playing against humans became *super* important for developing our research agenda! -ED