john_the_jedi

john_the_jedi t1_j646qh1 wrote

Hey everyone, I'm the first author of this preprint paper
"A Watermark For Large Language Models": https://arxiv.org/abs/2301.10226
I thought I'd jump in with a few relevant comments about some questions in this thread, especially relating to our approach.

  1. Our watermark is mathematically constructed to minimize false positives (accusing human text of being machine generated), even if it costs us a few detections of actual machine generated text. At any sufficient length of text, say 100-200 words, there is near 0.0 chance of a false positive. This is obviously the type of error we'd all like to avoid as much as possible.
  2. We are not anti-LLMs in any general way, these are amazing tools for everyone to use! Rather, we think that it's much better to have a new tool, watermarks, embedded in these models sooner rather than later. A world in which we have limited (currently zero really) ways of distinguishing AI and human generated content is likely to have some difficult to wrestle with consequences. We're concerned with bot farms and accidentally retraining "GPT-10" on tons of old GPT-3 outputs by accident.
  3. On removing the watermark, we don't claim it is not removable, we just have constructed the watermark procedure so that it is difficult, and comes with a cost to the quality of the output. The fact that many people suggest that they'll just use another LM to paraphrase the output, or that they'll just paraphrase it themselves, gets at a philosophical point we couldn't spend too much time talking about in the paper (though we run some attack experiments trying to remove the watermark). A la the, ship of theseus, if you sufficiently re-write the watermark out of the text, well, it's no longer the original text anyway even though it feels conceptually similar. Rewriting and rephrasing a paragraph from a textbook, but in your own words, and then putting it in a term paper, has always been a way to try and pass off the thoughts and ideas of others as your own. This fact of the world is unchanged.
1