Submitted by CaptainDifferent3116 t3_10fclbg in MachineLearning
I'm looking to detect spams generated by generative models (especially gpt). But all the ones I tried fail miserably ...
Submitted by CaptainDifferent3116 t3_10fclbg in MachineLearning
I'm looking to detect spams generated by generative models (especially gpt). But all the ones I tried fail miserably ...
Agreed, the design of GPT is to be indistinguishable from real text, hence you’re fighting a losing battle (unless you have millions to train a rival network). The only key may be in the way GPT interprets language, we are all aware of how it does not understand what it’s says. If you can find a way to target a pre-trained network at that you may have a chance but in theory by the nature of GPT you will loose.
[removed]
The best performing one so far would be : https://huggingface.co/roberta-base-openai-detector
From the model's card:
> Direct Use
> The model is a classifier that can be used to detect text generated by GPT-2 models. However, it is strongly suggested not to use it as a ChatGPT detector for the purposes of making grave allegations of academic misconduct against undergraduates and others, as this model might give inaccurate results in the case of ChatGPT-generated input.
Just for lols, I tested this with text-davinci-03.
Prompt: Write a post about liking fruity icecream, in the style of a reddit post. Make some minor grammar mistakes while writing the post.
Response: hey everyone, so I'm totally loving this fruity icecream lately! I got it from a store near my house and its been so good. Its really sweet but not overly sickeningly sweet, like some other ice cream. Plus the fact that its fruity makes it even better in my opinion. Does anyone else like fruity icecream too? It'd be awesome to hear some of your thoughts on it!
The above detector: > Computation time on Intel Xeon 3rd Gen Scalable cpu: 0.090 s > > Real 0.984
Using a black box model for this kind of stuff looks like a nice way to get sued
I'll try and share in a small article how I'm testing with the dataset's details.
Nothing works consistently, especially if an attacker tests their own outputs against the open source detectors, or makes manual tweaks to the outputs.
https://huggingface.co/spaces/openai/openai-detector
https://huggingface.co/spaces/Hello-SimpleAI/chatgpt-detector-single
Tried these already? I have not so I can't speak to their quality
The first one doesn't seem to work (at least the live test)
The second one is garbage...
Please be aware of this one as well:
>Edward Tian's app at GPTZero.me
Also cannot vouch for this, just trying to be a bit helpful :)
I tested this with text-davinci-03.
Prompt: Write a post about liking fruity icecream, in the style of a reddit post. Make some minor grammar mistakes while writing the post.
> hey everyone, so I'm totally loving this fruity icecream lately! I got it from a store near my house and its been so good. Its really sweet but not overly sickeningly sweet, like some other ice cream. Plus the fact that its fruity makes it even better in my opinion. Does anyone else like fruity icecream too? It'd be awesome to hear some of your thoughts on it!
This site gave me this:
> Your text is likely human generated!
>Make some minor grammar mistakes while writing the post.
Huh. So you told it to do something it wouldn’t ordinarily do.
This seems akin to salesman who took a sledge to a product and then argued that it breaks in the field (true story). When you leave that off, does the paragraph get caught? Or did you muck about to find something that assured it would think it was human generated?
That was my first try. I went with the gut feeling that any training that they used for their model would assume bland prompts. I made mine different, and got 97% human generated the first try. Someone else mentioned other things that you could do, like mess around with temperature and such. Those work as well.
It’s important to remember that these models are statistically robust. So while you may get a false positive or false negative, it does not reflect on the robustness of the model.
Where are the benchmarks and analyses that you're basing this statement on?
[removed]
If you could you could just make gpt better through a GAN architecture and then you couldn’t anymore
Wondering if you can build a GAN on top of GPT
GPT itself
I tried that but didn't work very well
Take a look at Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods by Crothers, Japkowicz, and Viktor (open access preprint on the arXiv, from October 2022)
The only people that have a prayer at doing this, is OpenAI themselves. It is likely they can insert an undetectable watermark in sufficiently generic text output for sufficiently many words which does not distort the meaning or quality appreciatively.
However, there is almost no way this can survive subsequent finetunings.. Like 'rewrite the previous paragraph with three new random words that doesn't change the meaning', and 'change all the nouns/verbs into synonyms that preserves the meaning of the paragraph'.
I strongly suspect (and might one day try my hand at the math) that there can be no such system that works in general against this sort of attack.
Also, did someone build a recent dataset with chatgpt examples for this ?
I came across this one last week which the author says is a fine-tuned BERT model: https://originality.ai/
They don't offer free trial . Who the hell does that ! I won't pay 20$ just to see the perf.
Oops - didn't realise that. Apologies
Yeah there's a detector on hugging face hub. It's not always correct and it's either sure from 99.99 % or 0.01 % or something. But usually it works.
It may be possible against specific models if you know them. It's the same as trying to recognize authors according to text
I'm sorry, I don't know of any model that can detect GPT-generated text.
If you're looking for a model to detect GPT-generated text, you're out of luck.
ThrillHouseofMirth t1_j4x7o9e wrote
I don't think that there's any way to do so at this point and eventually someone will prove it. "Original" language virtually always is a recombination previous language of sufficient complexity and uniqueness.
A possible solution to this is AI language model providers to provide API's that allow people to check content against an archive of text that it generated.
Any solution needs to monitoring and telemetry based, the days of algorithmic checking are definitively over.