john_the_jedi t1_j646qh1 wrote on January 27, 2023 at 4:21 PM

Reply to [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut

Hey everyone, I'm the first author of this preprint paper
"A Watermark For Large Language Models": https://arxiv.org/abs/2301.10226
I thought I'd jump in with a few relevant comments about some questions in this thread, especially relating to our approach.

Our watermark is mathematically constructed to minimize false positives (accusing human text of being machine generated), even if it costs us a few detections of actual machine generated text. At any sufficient length of text, say 100-200 words, there is near 0.0 chance of a false positive. This is obviously the type of error we'd all like to avoid as much as possible.
We are not anti-LLMs in any general way, these are amazing tools for everyone to use! Rather, we think that it's much better to have a new tool, watermarks, embedded in these models sooner rather than later. A world in which we have limited (currently zero really) ways of distinguishing AI and human generated content is likely to have some difficult to wrestle with consequences. We're concerned with bot farms and accidentally retraining "GPT-10" on tons of old GPT-3 outputs by accident.
On removing the watermark, we don't claim it is not removable, we just have constructed the watermark procedure so that it is difficult, and comes with a cost to the quality of the output. The fact that many people suggest that they'll just use another LM to paraphrase the output, or that they'll just paraphrase it themselves, gets at a philosophical point we couldn't spend too much time talking about in the paper (though we run some attack experiments trying to remove the watermark). A la the, ship of theseus, if you sufficiently re-write the watermark out of the text, well, it's no longer the original text anyway even though it feels conceptually similar. Rewriting and rephrasing a paragraph from a textbook, but in your own words, and then putting it in a term paper, has always been a way to try and pass off the thoughts and ideas of others as your own. This fact of the world is unchanged.

john_the_jedi t1_iux99v3 wrote on November 3, 2022 at 6:02 PM

Reply to [P] How to reverse engineer a neural network to get inputs from the outputs by ojiber

I would peruse the work on "model inversion". Inverting a model is not free, and the reconstructed inputs are noisy but for certain classes of models/learning problems, this is very doable.

This might get you started https://www.youtube.com/watch?v=_g-oXYMhz4M