Viewing a single comment thread. View all comments

MetaAI_Official OP t1_izfljno wrote

It takes significant effort, but yes, on the strategic planning side it is often possible to work out why CICERO came up with particular moves or intents. We often did this during development when debugging. You can look at the moves considered by the search for it and its opponents and see what values those achieved in the iterations within the search, and see how the equilibrium evolved in response to those values, you can look at the initial policy prior probabilities, and so on. Not entirely unlike walking through a debug log of how a chess engine explored a tree of possible moves and why it came up with the value it did. In fact, generally with systems that do explicit planning rather than simply running a giant opaque model end-to-end, it's usually possible to reverse-engineer "why" the system is doing something, although it may take a lot of time and effort per position. We haven't tried a human in the loop for choosing moves though. -DW

3

MetaAI_Official OP t1_izfm7lf wrote

We did also get good human players to review the games and look for really good or bad moves, but that was very early in the development process - CICERO generated good moves and it would be counter-productive to stop it making what it thinks is the best moves. For example, at the tournament I was at in Bangkok a few weeks ago I thought "what would CICERO do?" and then I did a different set of moves - but what CICERO would have done was right! -AG

3