KellinPelrine

KellinPelrine t1_iutis26 wrote

That makes sense. I think this gives a lot of evidence then that there's something more than just an exploit against the rules going on. It looks like it can't evaluate pass-alive properly, even though that seems to be part of the training. I saw in the games some cases (even in the "professional level" version) where even two moves in a row is enough to capture something and change the human-judgment status of a group, and not particularly unusual local situations either, definitely things that could come up in a real game. I would be curious if it ever passes "early" in a way that changes the score (even if not the outcome) in its self-play games (after being trained). Or if its estimated value is off from what it should be. Perhaps for some reason it learns to play on the edge, so to speak, by throwing parts of its territory away when it doesn't need it to still win, and that leads to the lack of robustness here where it throws away territory it really does need.

1

KellinPelrine t1_iusv4mq wrote

I see, that's definitely meaningful that you're using KataGo fork with no scoring changes. I think I did not fully understand pass-alive - I indeed took it in a more human sense that there is no single move that capture or break it. However, if I understand now what you're saying is that there has to be no sequence of moves of arbitrary length where one side continually passes and the other continually plays moves trying to destroy their territory? If that is the definition though it seems black also has no territory in the example you linked. If white has unlimited moves with black passing every time, white can capture every black stone in the upper right (and the rest of the board). So then it would seem to me that neither side has anything on the board, formally, in which case white (KataGo) should win by komi?

1

KellinPelrine t1_iusbido wrote

To my understanding, the modification quoted there is exactly what is being exploited - it was trained in a setting that does not require capturing stones within pass-alive territory, but here it's being tested in a setting that does require that. And that's 100% of the exploit - it doesn't capture stones in its own pass-alive territory, the attack makes sure to leave some stones in all of its pass-alive territories, so in the train setting KataGo would win easily but in the test setting all its territories end up not counting.

I think it's an interesting work that could be valuable in automating discovery of adversarial perturbations of a task (particularly scenarios one might think a model is designed for but are actually out of scope and cause severe failures, which is actually a pretty serious real-world problem). But it is most definitely not a small perturbation of inputs within the training distribution.

1

KellinPelrine t1_iuq21ix wrote

Do you know it actually supports the "full" sense of the rules being applied in this paper? The parameters defined at the link you gave seem to specify a way to get an equivalent result to Tromp-Taylor, IF all captures of dead stones were played out. But those parameters alone do not imply that it really knows to play out those captures. Or critically has even any possible way to know - with those parameters alone, it could have been trained 100% of the time on a more human version of those rules where dead stones don't have to be captured.

As another commenter asked about, I think it depends on the exact process used to construct and train KataGo. I suspect because a human seems to be able to easily mimic this "adversarial attack," it is not an attack on the model so much as on the documentation and interpretation/implementation of Tromp-Taylor rules.

7

KellinPelrine t1_iqtjzfs wrote

Just using date of publication or last date of modification does not avoid the issue I described. In my brief reading I couldn't find a link or reference for your data beyond it coming from Kaggle somehow (might have missed more exact reference), but your sample is definitely not random (as you describe it has exactly 2000 real and 2000 fake examples, while a representative random sample would not be balanced). If the 2000 fake ones have 2016 publication dates and the 2000 real ones have 2017 dates, you haven't found a new optimal detection method nor that every article ever published in 2016 was fake, you've found some artifact of the dataset. Still an important finding, especially if other people are using that data and might be drawing wrong conclusions from it, but not a new misinformation detection method.

Of course, it's probably not such an extreme case like that (although something nearly that extreme has occurred in some widely used datasets, as explained in paper I linked). But here's a more subtle thought experiment: suppose fake articles were collected randomly from a fact-checking website (a not uncommon practice). Further, maybe that fact-checking website expanded its staff near the 2016 US election, say in October, because there was a lot of interest in and public need for misinformation detection at that time. More staff -> more articles checked -> more fake news detected -> a random sample of fake news from the website will contain more examples from October (when there was more staff) than September. So in the data then the month is predictive, but that will not generalize to other data.

A machine learning paper, whatever the audience, requires some guarantee of generalization. Since the metadata features used in your paper are known to be problematic in some datasets, and the paper only reports results on one dataset, in my opinion it cannot give confidence in generalization without some explanation "why."

1

KellinPelrine t1_iqolwmz wrote

I think it's noteworthy that the month turns out to be the most informative, but it may be more a reflection on the data and its collection process than a strong feature for real-world detection. For example, there have been datasets collected in the past where real and fake examples were collected at different times, which makes month or other date information artificially predictive. See https://arxiv.org/pdf/2104.06952.pdf, especially 3.4.2.

So I'd encourage you to consider why month would be predictive (and the same for any other metadata), in order to make sure it's not an artifact of the dataset.

4