ThatSpysASpy t1_iupkljr wrote on November 2, 2022 at 2:07 AM

#423,120

The demonstrations shown in the paper are pretty unconvincing. In ordinary go scoring, dead stones are removed from the board at the end of the game, so the territory which supposedly isn't KataGo's would in fact be counted as its territory.

They say they use Tromp-Taylor rules, which requires all stones to be captured, but I would assume KataGo was trained with more standard human go rules. (Or at least they added some regularizer to make it pass once the value was high enough, otherwise humans playing vs it would get really annoyed).

PC_Screen t1_iupnx8q wrote on November 2, 2022 at 2:32 AM

#423,289

The fix for this is extremely simple, either give katago more search (100+ visits would be enough and that's less than a second on basically any modern gpu) or don't allow it to pass prematurely, both of which allow katago to win every game once again. My GTX 1650 gets 200 visits a second so 0.5s would be enough to make katago immune to this attack.

Dendriform1491 t1_iupp99t wrote on November 2, 2022 at 2:43 AM

#423,372

In the kifus, white (the victim) is clearly ahead. The passing begins when both territories are clearly defined.

The problem comes at counting time, when stones are marked as dead, here, white doesn't mark dead groups as dead, causing most white territory to be voided.

White is not tricked into passing prematurely. White passed correctly, as all black groups inside white territory are dead. They are surrounded on the outside, they have no potential for eye space and no eye shape.

Based on the moves alone, black loses in both cases. The problem is merely how KataGo is marking stones as dead during counting.

dojoteef t1_iupqxxr wrote on November 2, 2022 at 2:56 AM

#423,465

It seems most commenters are pointing out reasoning why the proposed setup seems deficient in one way or the other.

But the point of the research is to highlight potential blind spots even in seemingly "superhuman" models, even if the failure modes are weird edge cases that are not broadly applicable.

By first identifying the gaps, mitigation strategies can be devised that make training more robust. In that sense, the research is quite useful even if a knowledgable GO player might not be impressed by the demonstrations highlighted in the paper.

ThatSpysASpy t1_iupr7f5 wrote on November 2, 2022 at 2:58 AM

#423,479

Replying to dojoteef (#423,465)

But I don't even think it's a weird edge case! The pass is correct, and Katago wins this game with the maximum score possible. Saying "okay we're actually using this other scoring method which it wasn't designed for" seems pretty vacuous. (Unless I'm wrong and it was in fact trained for this rule set).

Dendriform1491 t1_iuprcl4 wrote on November 2, 2022 at 2:59 AM

#423,487

Replying to dojoteef (#423,465)

It's a defect in counting, that's all. The moves and the passing are correct. There's no premature passing.

dojoteef t1_iuprp1z wrote on November 2, 2022 at 3:02 AM

#423,513

Replying to ThatSpysASpy (#423,479)

KataGo supports Tromp-Taylor rules: https://lightvector.github.io/KataGo/rules.html

ThatSpysASpy t1_iuprye0 wrote on November 2, 2022 at 3:04 AM

#423,522

Replying to dojoteef (#423,513)

Oh cool, that does make it more interesting. Do you know whether it was trained to take the rules as an input somehow?

KellinPelrine t1_iuq21ix wrote on November 2, 2022 at 4:38 AM

#423,974

Replying to dojoteef (#423,513)

Do you know it actually supports the "full" sense of the rules being applied in this paper? The parameters defined at the link you gave seem to specify a way to get an equivalent result to Tromp-Taylor, IF all captures of dead stones were played out. But those parameters alone do not imply that it really knows to play out those captures. Or critically has even any possible way to know - with those parameters alone, it could have been trained 100% of the time on a more human version of those rules where dead stones don't have to be captured.

As another commenter asked about, I think it depends on the exact process used to construct and train KataGo. I suspect because a human seems to be able to easily mimic this "adversarial attack," it is not an attack on the model so much as on the documentation and interpretation/implementation of Tromp-Taylor rules.

[deleted] t1_iuq5gjf wrote on November 2, 2022 at 5:16 AM

#424,122

Replying to ThatSpysASpy (#423,522)

[deleted]

ARGleave t1_iuq5hq9 wrote on November 2, 2022 at 5:16 AM

#424,126

Replying to ThatSpysASpy (#423,522)

One of the authors here! Impressed people picked this paper up so quickly. To clarify, KataGo does indeed take the rules as an input. KataGo was trained with Tromp-Taylor like rules, with some randomization. From section 2 of the KataGo paper:
"Self-play games used Tromp-Taylor rules [21] modified to not require capturing stones within pass-aliveterritory. “Ko”, “suicide”, and “komi” rules also varied from Tromp-Taylor randomly, and some
proportion of games were randomly played on smaller boards."
KataGo actually supports an impressive variety of different rules. We always used Tromp-Taylor in evaluation, in keeping with KataGo's evaluation versus ELF and Leela Zero (section 5.1 of above paper) and work in Computer Go in general.
I get these games might look a bit artificial to Go players, since humans don't usually play Tromp-Taylor. But we view our contribution not about some novel insight into Go (it's really not), but about the robustness of AI systems. KataGo was trained on Tromp-Taylor, so it shouldn't be exploitable under Tromp-Taylor: but it is.

ARGleave t1_iuq67g8 wrote on November 2, 2022 at 5:25 AM

#424,168

Replying to ThatSpysASpy (#423,120)

I replied to this in https://www.reddit.com/r/MachineLearning/comments/yjryrd/comment/iuq5hq9/?utm_source=reddit&utm_medium=web2x&context=3 but since this is currently the top-voted comment, I wanted to be clear that the scoring used is Tromp-Taylor, which KataGo was primarily trained with and which is the standard for evaluation in Computer Go.

Good point about the regularizer! KataGo does indeed have some functionality to encourage what it calls "friendly" passing to make it nicer for humans to play against, as well as some bonuses in favour of passing when the score is close. We disabled this and other such features in our evaluation. This does make the victim harder to exploit, but it's still possible.

I think it's reasonable to view this attack as a little contrived, but from a research perspective the interesting question is why it exists in the first place -- why didn't self-play discover this vulnerability and fix it during training? If self-play cannot be trusted to find it, then could there be more subtle issues.

ARGleave t1_iuq6uc2 wrote on November 2, 2022 at 5:33 AM

#424,202

Replying to PC_Screen (#423,289)

One of the authors of the paper here. We agree search looks promising as a defense. It's true our current attack falters at >100 visits (win rate drops to 10%), however we're not sure if this is because search truly makes the victim robust, or if it's just making it harder to exploit (like gradient masking). We're working on strengthening our attack now and seeing if we can exploit victims with more search.

This paper isn't meant to be a critique of KataGo: we think it's a great AI system! In fact, we picked KataGo because we expected it to be one of the hardest AI systems to exploit. Enough search might well be enough to solve things for KataGo (though I expect it's going to need to be be more like 1600 visits than 100), but can we use search in other settings where this kind of vulnerability might arise? Some games have even larger branching factors than Go, making search of limited use. Real-world situations often have unknown transition dynamics, that you can't even search over. Ultimately we're using KataGo to study vulnerabilities that can emerge in self-play systems and narrowly superhuman AI systems more broadly, so we'd like to find solutions that work not just for KataGo.

ARGleave t1_iuq747k wrote on November 2, 2022 at 5:36 AM

#424,214

Replying to KellinPelrine (#423,974)

The KataGo paper describes it as being trained on "Self-play games used Tromp-Taylor rules modified to not require capturing stones within pass-aliveterritory. “Ko”, “suicide”, and “komi” rules also varied from Tromp-Taylor randomly, and some proportion of games were randomly played on smaller boards." The same paper also evaluated on Tromp-Taylor rules, so I think what we're evaluating on is both on-distribution for training and the standard practice for evaluation.

icosaplex t1_iuqgoyd wrote on November 2, 2022 at 7:51 AM

#424,636

Replying to ARGleave (#424,202)

I suspect that there is a good chance that there simply do not exist widely general solutions that *don't* look something like search. Where by search, I mean the more general sense of an inference-time process by which you invest more compute to in some manner roll-out or to re-evaluate the value of or the likely consequences of your first instincts, as opposed to only making one or a small number of inference/prediction/decoding passes and then just going with it.

Humans too have optical illusions where we parse an image wrong on first instinct but a second or two of conscious thought realizes what's happening. Or when a human is faced with any real-life situation, or any video game situation, or puzzle or whatever, that is entirely unlike anything they have thought about or seen before (i.e. out of distribution), and given only an instant to react, it is not surprising if they react very incorrectly. But when given time to think about the novel situation, they may respond much better.

It seems unreasonable to expect general systems to reliably do well out of distribution without some form of search at inference time, again using search in this very general sense.

And humans do regularly perform "search" in this general sense even in environments with vastly larger branching factors, and with imperfectly known transition dynamics. Somehow.

picardythird t1_iuqhsya wrote on November 2, 2022 at 8:07 AM

#424,678

Replying to ARGleave (#424,168)

It is absolutely misleading to claim that Tromp-Taylor is "the standard for evaluation" in computer go.

Tromp-Taylor scoring has been used occasionally as a convenient means of simplifying the way that games are scored for the purposes of quantitative evaluation. However, area scoring (such as standard Chinese rules) or territory scoring (such as standard Japanese rules) are overwhelmingly more common, not to mention that these are actual rulesets used by actual go players.

Your claims are inflated and rely on overly-specific problem statements that do not map to normal (or even common) usage.

picardythird t1_iuqi550 wrote on November 2, 2022 at 8:12 AM

#424,695

This is akin to a full research paper about a bug report. It's utterly unsurprising that a program may harbor some unusual or unexpected behavior in extremely uncommon edge cases. To me, this anomalous behavior simply arises as a consequence of an edge case in how KataGo handles unusual rulesets; this can be easily patched (as is extremely common for game-playing engines in general), or (as other commenters have pointed out) by not artificially restricting KataGo's strength.

icosaplex t1_iuqm4ye wrote on November 2, 2022 at 9:13 AM

#424,879

Primary author of KataGo here:

Wanted to say that I think this is overall good/interesting research. I have both some criticisms and some supports to offer:

One criticism is the way 64-visit KataGo is characterized as simply "near-superhuman". 64-visit KataGo might be near-superhuman when in-distribution, which is very much not the case in these positions. There's no reason to expect it to be so good when out-of-distribution, indeed if 64 visits is just about the bare minimum to be superhuman when in-distribution, then one would generally expect to need more visits to perform well when going even a little out of distribution, much less massively-out-of-distribution like in the examples in this paper.

In support of this general phenomenon observed by this paper, I'd like to offer something that I think is known to on-the-ground to people in the Go community and who have followed computer go but I suspect is somehow still unknown broadly to people in the academic community - there are also "naturally-arising" situations where "superhuman" AlphaZero-style bots clearly and systematically perform at highly sub-human levels. Again, because those situations are out of distribution, they're just naturally-arising out-of-distribution examples.

Perhaps the most well-known of these is the "Mi Yuting's flying dagger" joseki. This is an opening pattern known for its high complexity and where best play results in a very high density of rare shapes and unusual moves, with an unusually large amount of branching and choice. A lot of AlphaZero-replications: Leela Zero, ELF, and likely others (MiniGo? etc.) all resulted in bots that greatly misevaluated a lot of lines of the flying dagger pattern, due to not exploring sufficiently many of these lines in self-play (out of distribution!), and thus were exploitable by a sufficiently experienced human player who had learned these lines.

(KataGo is only robust to the flying dagger joseki due to manual intervention to specifically add training on a human-curated set of variations for it, otherwise to this day it would probably be vulnerable to some lines).

There are some other lesser examples too in other patterns. Plus it is actually a pretty common occurrence in high-level pro games (once per couple of games?) that KataGo or other bots even when given tens of thousands of playouts fail to see a major tactic that the human pros evaluated correctly. That top Go AIs are still commonly outperformed by humans in individual positions, even if not on average across a game - I suspect is also under-appreciated. I hypothesize that a least a little part of this is from human players playing in ways that differ enough from how the bot would play, or sometimes due to both sides making mistakes that lead to an objectively even position again but ends up with the humans reaching kinds of positions that AI-selfplay would never have reached.

This hypothesis if true might also help explain a seeming paradox about how over on r/baduk and in Go community discords, it's a common refrain to have a less-experienced player post a question about why an AI is suggesting this or that move, only for the answer to be "you should distrust the AI, you used too few visits, the AI's evaluations are genuinely misleading/wrong" when supposedly as few as 64 or 100 visits is supposed to be pro level or near superhuman.

I think the key takeaway here is that AlphaZero in general does *not* give you superhuman performance on a game. It gives you superhuman performance on the in-distribution subset of the game states that "resemble" those explored by self-play, and in games with exponential state spaces, that subset may not cover all the important parts of the space well (and no current common methods of exploration or adding noise seem sufficient to get it to cover the space well).

new_name_who_dis_ t1_iurd1je wrote on November 2, 2022 at 1:50 PM

#426,509

It's funny because as a human looking at those board positions I'd potentially also pass and say to my opponent, "come on, those stones are dead, we both know it", and if they disagree we start playing again.

Like in the first game, the only stone that could potentially make life is the bottomest rightmost stone, and even then probably not. All the other stones are unquestionably dead.

uYExkYKy t1_iureakc wrote on November 2, 2022 at 1:59 PM

#426,591

Replying to ARGleave (#424,214)

Isn't "not require capturing stones within pass-aliveterritory" referring to exactly this issue? What else could it mean? Did you use the original katago evaluator or write your own?

VarietyMart t1_iurgqh5 wrote on November 2, 2022 at 2:17 PM

#426,747

The proposed method assumes the opponent must capture/remove dead stones per tromp-Taylor rules, but KataGo's setup does not include this requirement (nor do traditional rule sets). So the proposed model is not actually "winning" -- it's losing badly then simply exploiting a scoring bug.

SleekEagle t1_iurllof wrote on November 2, 2022 at 2:51 PM

#427,043

Replying to dojoteef (#423,465)

Exactly - if we want robust systems that interact with our lives with any sort of weight (e.g. autonomous vehicles), then we need to know about weird failure modes, how to address them, and, perhaps most importantly, how to find them

ARGleave t1_iurvu97 wrote on November 2, 2022 at 3:59 PM

#427,679

Replying to uYExkYKy (#426,591)

Good question! We used KataGo to score the games*. KataGo's notion of pass-alive territory is quite restrictive: it's territory which is guaranteed to remain alive even if one player keeps passing and allows the other player to keep playing stones in that territory. The formal definition is points 4 and 5 under the Additional Definitions heading of KataGo rules. If we look at https://goattack.alignmentfund.org/?row=0#no_search-board then the white territory in lower-left is not pass-alive: if white passed indefinitely, then black could surround the stones and capture it.

* With one exception: the results against hard-coded baselines were scored by the baseline script itself, so that we could also evaluate against other AI systems like ELF/Leela on a level playing field. We tested the scoring for that agrees with KataGo.

[deleted] t1_ius61qu wrote on November 2, 2022 at 5:04 PM

#428,314

Replying to ThatSpysASpy (#423,479)

[removed]

ARGleave t1_ius7nup wrote on November 2, 2022 at 5:15 PM

#428,443

Replying to icosaplex (#424,636)

I'm pretty sympathetic to this perspective. The concerning thing is that scaling up neural networks like GPT-3 is getting a lot more attention (and resources) than neurosymbolic approaches or other search-like algorithms that might solve this problem. Pure neural net scaling does seem like it's enough to get good average-case performance on-distribution for many tasks. So it's tempting to also believe that with enough scale, once you hit human-level performance on the average-case you'll also get human-level robustness for free, as the network learns the right representation. This isn't universally believed, but I've spoken to many scaling adherents who hold some version of this view. Part of the motivation of the paper was to show this is false, that even highly capable networks are quite vulnerable by themselves, and that something else (whether search, or a different training technique) is needed to get robustness.

KellinPelrine t1_iusbido wrote on November 2, 2022 at 5:40 PM

#428,667

Replying to ARGleave (#424,214)

To my understanding, the modification quoted there is exactly what is being exploited - it was trained in a setting that does not require capturing stones within pass-alive territory, but here it's being tested in a setting that does require that. And that's 100% of the exploit - it doesn't capture stones in its own pass-alive territory, the attack makes sure to leave some stones in all of its pass-alive territories, so in the train setting KataGo would win easily but in the test setting all its territories end up not counting.

I think it's an interesting work that could be valuable in automating discovery of adversarial perturbations of a task (particularly scenarios one might think a model is designed for but are actually out of scope and cause severe failures, which is actually a pretty serious real-world problem). But it is most definitely not a small perturbation of inputs within the training distribution.

ARGleave t1_iuseu7k wrote on November 2, 2022 at 6:00 PM

#428,823

Replying to KellinPelrine (#428,667)

Our adversary is a forked version of KataGo, and we've not changed the scoring rules at all in our fork, so I believe the scoring is the same as KataGo used during training. When our adversary wins, I believe the victims' territory is not pass-alive -- the game ends well before that. Note pass-alive here is a pretty rigorous condition: there has to be no sequence of legal moves of the opposing color that result in emptying the territory. This is a much more stringent condition than what human players would usually mean by a territory being dead or alive.

If we look at https://goattack.alignmentfund.org/adversarial-policy-katago?row=0#no_search-board then the white territory in the bottom-left is not pass-alive. There are a sequence of moves by black that would capture all the white stones, if white played sufficiently poorly (e.g. playing next to its groups and letting black surround it). Of course, white can easily win -- and if we simply modify KataGo to prevent it from passing prematurely, it does win against this adversary.

> But it is most definitely not a small perturbation of inputs within the training distribution.

Agreed, and I don't think we ever claimed it was. This is building on the adversarial policies threat model we introduced a couple of years ago. The norm-bounded perturbation threat model is an interesting lens, but we think it's pretty limited: Gilmer et al (2018) had an interesting exploration of alternative threat models for supervised learning, and we view our work as similar in spirit to unrestricted adversarial examples.

[deleted] t1_iusfv8w wrote on November 2, 2022 at 6:07 PM

#428,880

Replying to dojoteef (#423,465)

[removed]

[deleted] t1_iusgyqw wrote on November 2, 2022 at 6:14 PM

#428,936

Replying to ARGleave (#424,168)

[removed]

[deleted] t1_iuskoy8 wrote on November 2, 2022 at 6:37 PM

#429,147

[removed]

[deleted] t1_iuslara wrote on November 2, 2022 at 6:41 PM

#429,181

Replying to ARGleave (#428,443)

[removed]

Stochastic_Machine t1_iusn25k wrote on November 2, 2022 at 6:52 PM

#429,269

Replying to [deleted] (#428,880)

Yeah, I’m in the same boat as you. Changing the rules, state distribution, and the policy itself then getting bad results is not surprising.

ARGleave t1_iusoxjq wrote on November 2, 2022 at 7:04 PM

#429,369

Replying to [deleted] (#429,181)

I'm talking about policy networks as in many systems that is all there is. OpenAI Five and AlphaStar both played without search, and adding search to those systems is a research problem in its own right. If a policy network cannot be robust without search, then I'd argue we need to put more effort as a community into developing methods like MuZero that might allow us to apply search to a broader range of settings, and less on just scaling up policies.

But granted, KataGo itself was designed with search, so (as your other comment also hinted at) might the policy network be vulnerable because it was not trained to win without search? The training is designed to distill the search process into the policy, so I don't think the policy should be uniquely vulnerable without search -- to the extent this distillation succeeds, the policy network without search should be comparable to an earlier checkpoint with search. However, I do think our attack faltering at 128 visits and beyond on the latest networks is a weakness, and one we're looking to address.

[deleted] t1_iusq13r wrote on November 2, 2022 at 7:11 PM

#429,419

[removed]

ARGleave t1_iust83j wrote on November 2, 2022 at 7:31 PM

#429,599

Replying to [deleted] (#428,880)

>Alignment is a serious problem and understanding the failure modes of AI systems is crucial, but it necessitates serious evaluation of the systems as they are actually used. Breaking a component in isolation and then drawing conclusions about the vulnerabilities of dramatically different systems is not the clearminded research the problem of alignment deserves. "After removing all the failsafes (and taking it o. o. d.), the system failed" is not a meaningful result

I agree there's a possibility our result might not generalize to other domains. But, we've got to start somewhere. We picked KataGo as we expected it to be one of the harder systems to break: it's trained in a zero-sum setting, so is explicitly trained to be adversarially robust, and is highly capable. We're planning on seeing if a similar attack succeeds in other games and AI systems such as Leela Chess Zero in future work.

Although I agree limited search is unrealistic, it's not unheard of -- there are bots on KGS that play without search, and still regularly beat strong players! The KataGo policy network without search really is quite strong (I certainly can't beat it!), even if that's not how the system was originally designed to be used.

Taking it o.o.d. seems fair game to me as it's inevitable in real deployments of systems. Adversaries aren't limited to only doing things you expect! The world changes and there can be distribution shift. A variant of this criticism that I find more compelling though is that we assume we can train against a frozen victim. In practice many systems might be able to learn from being exploited: fool me once shame on you, fool me twice shame on you and all that.

>The "AlphaZero method" is not designed to create a policy for continuous control and it's bizarre to evaluate the resulting policies as if they were continuous policies. It's not valid (and irresponsible, imho) to extrapolate these results to *other* systems' continuous control policies.

I'm confused by this. The paragraph you quote is the only place in the paper we discuss continuous control, and it's explicitly referencing prior work that introduced a similar threat model, and studied it in a continuous control setting. Our work is asking if it's only a problem with continuous control or generalizes to other settings and more capable policies. We never claim AlphaZero produces continuous control policies.

>KataGo is using the PUCT algorithm for node selection. One criticism of PUCT is that the policy prior for a move is never fully subsumed by the evaluation of its subtree; at very low visits this kind of 'over-exploration' of a move that's returning the maximum negative reward is a known issue. Also, the original version of Alphazero (& KataGo) uses cumulative regret instead of simple regret for move selection; further improvements to muzero give a different node-selection algorithm that i believe fixes this problem with a single readout (see the muzero gumbel paper, introduction, "selecting actions in the environment").

This is an interesting point, thanks for bringing it to our attention! We'll look into evaluating our adversary against KataGo victims using these other approaches to action selection.

In general, I'm interested in what version of these results you would find convincing? If we exploited a victim with 600 search plies (the upper end of what was used in self-play), would that be compelling? Or only at 10k-100k search plies?

KellinPelrine t1_iusv4mq wrote on November 2, 2022 at 7:44 PM

#429,710

Replying to ARGleave (#428,823)

I see, that's definitely meaningful that you're using KataGo fork with no scoring changes. I think I did not fully understand pass-alive - I indeed took it in a more human sense that there is no single move that capture or break it. However, if I understand now what you're saying is that there has to be no sequence of moves of arbitrary length where one side continually passes and the other continually plays moves trying to destroy their territory? If that is the definition though it seems black also has no territory in the example you linked. If white has unlimited moves with black passing every time, white can capture every black stone in the upper right (and the rest of the board). So then it would seem to me that neither side has anything on the board, formally, in which case white (KataGo) should win by komi?

[deleted] t1_iuswbpm wrote on November 2, 2022 at 7:51 PM

#429,773

Replying to ARGleave (#429,369)

[removed]

ARGleave t1_iusxvdj wrote on November 2, 2022 at 8:01 PM

#429,848

Replying to KellinPelrine (#429,710)

I agree the top-right black territory is also not pass-alive. However, it gets counted as territory for black because there are no white stones in that region. If white had even a single stone there (even if it was dead as far as humans are concerned) then that wouldn't be counted as territory for black, and white would win by komi.

The scoring rules used are described in https://lightvector.github.io/KataGo/rules.html -- check "Tromp-Taylor rules" and then enable "SelfPlayOpts". Specifically, the scoring rules are:

>(if ScoringRule is Area)
The game ends and is scored as follows:
(if SelfPlayOpts is Enabled): Before scoring, for each color, empty all points of that color within pass-alive-territory of the opposing color.
(if TaxRule is None): A player's score is the sum of:
+1 for every point of their color.
+1 for every point in empty regions bordered by their color and not by the opposing color.
If the player is White, Komi.
The player with the higher score wins, or the game is a draw if equal score.

So, first pass-alive regions are "emptied" of opponent stones, and then each player gets points for stones of their color and in empty regions bordered by their color.

Pass-alive is defined as:

>A black or white region R is a pass-alive-group if there does not exist any sequence of consecutive pseudolegal moves of the opposing color that results in emptying R.[2]
A {maximal-non-black, maximal-non-white} region R is pass-alive-territory for {Black, White} if all {black, white} regions bordering it are pass-alive-groups, and all or all but one point in R is adjacent to a {black, white} pass-alive-group, respectively.[3]

It can be computed by Benson's algorithm.

[deleted] t1_iut3g3l wrote on November 2, 2022 at 8:36 PM

#430,123

Replying to ARGleave (#429,599)

[removed]

ARGleave t1_iut3tue wrote on November 2, 2022 at 8:39 PM

#430,140

Replying to [deleted] (#429,773)

>AI-Five & AlphaStar are continuous systems; their policy networks are basically driving the whole show and has fewer redundancies/failsafes built in. We should expect greater robustness there!

I'm confused by how you're using continuous. My understanding is that both Dota and Starcraft have discrete action spaces. Observation space is technically discrete too (it's from a video game) but maybe is sufficiently large it's better to model as continuous in some cases. Why do you expect greater robustness? It seems more challenging to be robust in a high-dimensional space and if I remember correctly some human players even figured out ways to exploit OpenAI Five.

>The hope -- the whole point of the method! -- is that the policy & value become sufficiently general that it can do useful search in parts of the state space that are out-of-distribution.

This is a good point, and I'm excited by attempting to scale the attack to victims with more search to address whether the method as a whole is robust at sufficient levels of search. My intuition is that if the policy and value network are deeply flawed then search will only reduce the severity of the problem not eliminate it: you can't search to the end of the game most of the time, so you have to rely on the value network to judge the leaf nodes. But ultimately this is still an open empirical question.

>It's plausible that "policy without search is comparable to an earlier checkpoint with search", but showing that policy-only needs more training does not show anything -- you need to show me that the future-policy-only would not be able to have learned your adversarial example. If you showed that the bad-policy with search produced data that still produced bad-policy, that would be really interesting!

I'm not sure I fully understand this. We train our adversarial policy for about 0.5% of the training time of the victim. Do you think 0.5% additional self-play training would solve this problem? I think the issue is that self-play gets stuck in a narrow region of state space and stops exploring.

Now you could absolutely train KataGo against our adversary, repeat the attack against this hardened version of KataGo, train KataGo against the new adversary, etc. This is no longer self-play in the conventional sense though -- it's closer to something like policy-space response oracle. That's an interesting direction to explore in future work, and we're considering it, but it has its own challenges -- doing iterated best response is much more computationally challenging than the approximate best response in conventional self-play.

ARGleave t1_iut6tjs wrote on November 2, 2022 at 8:58 PM

#430,259

Replying to [deleted] (#430,123)

>Ok, but you're testing them as if they were continuous control policies, i.e. without search. When you say things like "[KataGo] is explicitly trained to be adversarially robust," but then you "break" only the policy network, it neither demonstrates that the entire KataGo system is vulnerable NOR does it follow that systems that are trying to produce robust continuous control policies will be vulnerable.

Thanks for the clarification! If I understand correctly, the key point here is that (a) some systems are trained to produce a policy that we expect is robust, and (b) others have just a policy as a sub-component and the target is the overall system being robust. We're treating a type-(b) system as if it were type-(a) and that this is an unfair evaluation? I think this is a fair criticism, and we definitely want to try scaling our attack to exploit KataGo with more search!

However, I do think our results provide some evidence as to the robustness of both type-(a) and type-(b) systems. For type-(a) we know the policy head itself is a strong opponent in typical games, that beats many humans on KGS (bots like NeuralZ06 play without search). This at least shows that there can be subtle vulnerabilities in seemingly strong policies. It doesn't guarantee that self-play on a policy that was designed to work without search would have this vulnerability -- but prior work has found such vulnerabilities, albeit in less capable systems, so a pattern is emerging.

For vulnerability of type-(b), if the policy/value network heuristics are systematically biased in certain board states, then a lot of search might be needed to overcome this. And as you say, it can be hard to know how much search is enough, although surely there's some amount which would be sufficient to make it robust (we know MCTS converges in the limit of infinite samples).

As an aside, I think you're using continuous control in a different manner to me which is what confused me. I tend to think of continuous control as being about the environment: is this a robotic control task with continuous observations and actions? In your usage it seems more synonymous with "policy trained without search". But people do actually use search in continuous control sometimes (e.g. model-predictive control), and use policies without search in discrete environments (e.g. AlphaStar), although there are of course some environments better suited to one method over the other.

[deleted] t1_iutajq1 wrote on November 2, 2022 at 9:22 PM

#430,412

Replying to ARGleave (#430,140)

[removed]

[deleted] t1_iutf5n8 wrote on November 2, 2022 at 9:54 PM

#430,588

Replying to ARGleave (#430,259)

[removed]

KellinPelrine t1_iutis26 wrote on November 2, 2022 at 10:19 PM

#430,748

Replying to ARGleave (#429,848)

That makes sense. I think this gives a lot of evidence then that there's something more than just an exploit against the rules going on. It looks like it can't evaluate pass-alive properly, even though that seems to be part of the training. I saw in the games some cases (even in the "professional level" version) where even two moves in a row is enough to capture something and change the human-judgment status of a group, and not particularly unusual local situations either, definitely things that could come up in a real game. I would be curious if it ever passes "early" in a way that changes the score (even if not the outcome) in its self-play games (after being trained). Or if its estimated value is off from what it should be. Perhaps for some reason it learns to play on the edge, so to speak, by throwing parts of its territory away when it doesn't need it to still win, and that leads to the lack of robustness here where it throws away territory it really does need.

ARGleave t1_iutmvdj wrote on November 2, 2022 at 10:48 PM

#430,930

Replying to KellinPelrine (#430,748)

>Or if its estimated value is off from what it should be. Perhaps for some reason it learns to play on the edge, so to speak, by throwing parts of its territory away when it doesn't need it to still win, and that leads to the lack of robustness here where it throws away territory it really does need.

That's quite possible -- although it learns to predict the score as an auxiliary head, the value function being optimized is the predicted win rate, so if it thinks it's very ahead on score it would be happy to sacrifice some points to get what it thinks is a surer win. Notably the victim's value function (predicted win rate) is usually >99.9% even on the penultimate move where it passes and has effectively thrown the game.

[deleted] t1_iutuvjd wrote on November 2, 2022 at 11:46 PM

#431,283

Replying to KellinPelrine (#430,748)

[removed]

icosaplex t1_iux2lf3 wrote on November 3, 2022 at 5:20 PM

#436,337

Replying to ARGleave (#429,599)

For reference, self-play typically uses 1500 visits per move right now, rather than 600. (That is, on the self-play examples that are recorded for training. The rollout of the game trajectory between them uses fewer).

I would not be so surprised if you could scale up the attack to work at that point. It would be interesting. :)

In actual competitions and matches, i.e. full-scale deployment, the number of visits used per move is typically in the high millions or tens of millions. This is in part why the neural net for AlphaZero board game agents is so tiny compared to models in other domains (e.g. #parameters measured in millions rather than billions). It's because you want to make them fast enough to query a large number of times at inference.

I'm also very curious to know how much the attack is relying specifically the kind of adversarial exploitation that is like image misclassification attacks almost impossible to fix, versus relying on the neural net being undertrained in these kinds of positions in a way that is easy to simply train.

For example, if the neural net were trained more on these kinds of positions both to predict not to pass initially, and to predict that the opponent will pass in response, and then frozen, does it only gain narrow protection and still remains just as vulnerable, just needing a slightly updated adversary? Or does it become broadly robust to the attack? I think that's a thing that would be highly informative to understanding the phenomenon, just as much if not moreso than simply scaling up the attack.

Background-Try3987 t1_iuxyd5m wrote on November 3, 2022 at 8:42 PM

#437,577

The problem with this research is that KataGo does not know that it is playing under TT rules, which is unfair to KataGo. If humans don't know that they are playing under TT rules, they will also "lose" to the agent. Therefore, claiming that it has won the KataGo is kind of insulting to the KataGo's authors. No human player will think that KataGo has lost to the paper's agent.

Although KataGo did use the TT rules during training, most games (90%) are judged by the win rate of the agent. Moreover, even in the games that use TT rules, the games typically have captured most of the dead stones since the losing side won't "PASS" and will keep playing until there is no place to play. Hence, it is safe to use TT rules. Therefore, to KataGo, it won't know that the states that the paper show is losing since it won't happen during training.

Tiranasta t1_iv07rkh wrote on November 4, 2022 at 9:22 AM

#440,851

Replying to VarietyMart (#426,747)

This is false. KataGo's training includes the ruleset used here.

Bitter_Ad_7063 t1_iv14b4b wrote on November 4, 2022 at 2:33 PM

#442,650

In my opinion as a 6dan player in Go this is completely worthless research that is an insult to the gocommunity nothing more nothing less. Nobody ever uses the ruleset that they've exploited here - i have never heard of it. Anyone who is decent at Go would look at the boards and immediately judge that Katago has won not lost and any ruleset that i have ever played under would agree.

ummicantthinkof1 t1_ivjy33b wrote on November 8, 2022 at 2:59 PM

#473,606

Replying to ARGleave (#429,369)

On the contrary, I would not expect "search + distillation" to inherently create a policy network that is robust without search. It seems reasonable to imagine that during training Katago has hypothesized Tromp-Taylor "just drop a stone in their territory" attacks, read out refutations through search, and discarded that line of play. The refutation would not get distilled into the policy, because it's a line that is never chosen. But - it's never chosen because in its expected environment of full playout Katago is already capable of refuting. In a no-search environment, hypothesizing the attack would directly create great to counter it.

We have certainly seen odd behavior when Go playing AI are well ahead, to the extent of just idly filling in territory to lose points or ignoring the death of groups. But - at a certain point the game becomes close again, we return to in-distribution, and it wins easily. So it seems like using a ruleset that can move directly from well outside of distribution to scoring would be a likely weakness - but, if this attack isn't successful with higher playout rates then Katago may very well already be robust against that weakness, and it isn't necessarily true that there are others (again, since most 'leave distribution by playing poorly' attacks seem to pass back through a realistic distribution on their way to victory.

I'm very sympathetic to the unreasonable cost of doing research in this domain, but "trained on playouts of 600 or 800 or whatever and then defeated at 64" seems like it has an Occam's Razor explanation of "Using a policy network in an environment unlike the one it was trained on doesn't work"

sb710031 t1_ivn64cx wrote on November 9, 2022 at 4:20 AM

#480,558

Here is our work in NeurIPS 2022, presenting another adversarial attack for AlphaZero Go like KataGo for your reference.
Paper: https://arxiv.org/abs/2211.03769

According to our NeurIPS paper, even KataGo still have blind spots that are easy to be found. We show that KataGo makes a trivial mistake that even amateur human players can easily know to play.

This work does not involve "if KataGo doesn't agree with the outcome of TT rules after two passes"
More details are at our GitHub: https://PaperCode.cc/GoAttack.
Or discuss with us on Reddit:
https://www.reddit.com/r/MachineLearning/comments/ypyk75/r_adversarial_examples_of_go_ais_neurips_2022/

Ancient_Lecture1594 t1_ivn6sxw wrote on November 9, 2022 at 4:26 AM

#480,595

I would say this is nonsense because under any competitive rules Katago won the game. We human only play go in 2 ruleset: Japanese and Chinese but not anything else. And arcording to any of the 2 ruleset mentioned: Katago won. I dont care on what planet that the game was played but on earth we dont call that a loss for Katago.

Ancient_Lecture1594 t1_ivn7icr wrote on November 9, 2022 at 4:32 AM

#480,639

Replying to new_name_who_dis_ (#426,509)

agreed. they created the scenario favored them and then declared that they can win lmao

DontTakeAnyCrap t1_ivx2qn5 wrote on November 11, 2022 at 6:26 AM

#499,445

Although the general idea of finding exploits seems a reasonable line of inquiry, this is just not a convincing example.

I have played quite a bit against a number of AI go programs (w/ handicap of course) and eventually it is possible to find exploitable patterns or just notice clear mistakes.

The most obvious mistakes are usually in the end game such as confusing Japanese and Chinese rules and filling internal territory when only dame(neutral territory) is left.

But at least one bot (SpringBot) had very limited openings which lead to a constant exploit. They eventually fixed it by making more varied openings, but the exploit is still there if it happens to play the original opening.

These types of situations seem worth studying, but considering the reported strategy as an "attack" does not seem reasonable. It is the type of strategy beginners use because if they place a stone in their opponents territory they think their opponent then has to spend 4 stones to remove, it thereby gaining 3 points.

Recommendtion: Discuss the topic with strong players (those that can play the bots without handicaps) and see what types of quirks and exploits they have found.

Comments