Viewing a single comment thread. View all comments

icosaplex t1_iuqm4ye wrote

Primary author of KataGo here:

Wanted to say that I think this is overall good/interesting research. I have both some criticisms and some supports to offer:

One criticism is the way 64-visit KataGo is characterized as simply "near-superhuman". 64-visit KataGo might be near-superhuman when in-distribution, which is very much not the case in these positions. There's no reason to expect it to be so good when out-of-distribution, indeed if 64 visits is just about the bare minimum to be superhuman when in-distribution, then one would generally expect to need more visits to perform well when going even a little out of distribution, much less massively-out-of-distribution like in the examples in this paper.

In support of this general phenomenon observed by this paper, I'd like to offer something that I think is known to on-the-ground to people in the Go community and who have followed computer go but I suspect is somehow still unknown broadly to people in the academic community - there are also "naturally-arising" situations where "superhuman" AlphaZero-style bots clearly and systematically perform at highly sub-human levels. Again, because those situations are out of distribution, they're just naturally-arising out-of-distribution examples.

Perhaps the most well-known of these is the "Mi Yuting's flying dagger" joseki. This is an opening pattern known for its high complexity and where best play results in a very high density of rare shapes and unusual moves, with an unusually large amount of branching and choice. A lot of AlphaZero-replications: Leela Zero, ELF, and likely others (MiniGo? etc.) all resulted in bots that greatly misevaluated a lot of lines of the flying dagger pattern, due to not exploring sufficiently many of these lines in self-play (out of distribution!), and thus were exploitable by a sufficiently experienced human player who had learned these lines.

(KataGo is only robust to the flying dagger joseki due to manual intervention to specifically add training on a human-curated set of variations for it, otherwise to this day it would probably be vulnerable to some lines).

There are some other lesser examples too in other patterns. Plus it is actually a pretty common occurrence in high-level pro games (once per couple of games?) that KataGo or other bots even when given tens of thousands of playouts fail to see a major tactic that the human pros evaluated correctly. That top Go AIs are still commonly outperformed by humans in individual positions, even if not on average across a game - I suspect is also under-appreciated. I hypothesize that a least a little part of this is from human players playing in ways that differ enough from how the bot would play, or sometimes due to both sides making mistakes that lead to an objectively even position again but ends up with the humans reaching kinds of positions that AI-selfplay would never have reached.

This hypothesis if true might also help explain a seeming paradox about how over on r/baduk and in Go community discords, it's a common refrain to have a less-experienced player post a question about why an AI is suggesting this or that move, only for the answer to be "you should distrust the AI, you used too few visits, the AI's evaluations are genuinely misleading/wrong" when supposedly as few as 64 or 100 visits is supposed to be pro level or near superhuman.

I think the key takeaway here is that AlphaZero in general does *not* give you superhuman performance on a game. It gives you superhuman performance on the in-distribution subset of the game states that "resemble" those explored by self-play, and in games with exponential state spaces, that subset may not cover all the important parts of the space well (and no current common methods of exploration or adding noise seem sufficient to get it to cover the space well).

77