natepriv22 t1_j5ntbw0 wrote on January 24, 2023 at 8:58 AM

Reply to comment by firem1ndr in CNET's AI Journalist Appears to Have Committed Extensive Plagiarism by iingot

No that's basically how none of these AIs work. You don't understand how machine learning works. Please stop spreading misinformation and do some research first.

If the AI is plagiarizing then so are you in writing your comment, as you sure as heck didn't just learn to write out of the blue.

The model never contains the original text, can you imagine how huge that would be? Nobody would be able to run it and def nobody would have enough money to access it. The model uses a noise and denoising algorithm, and a discriminator algorithm to make sure the output is the most likely correct output.

So its literally not possible for it to commit plagiarism because it doesn't contain the og text. For it to be accidental plagiarism, it would have to accidentally generate the exact same output, with no memory of the original input, except for an idea of turning noise into comprehensible text.

To put it in other words, that would be like you writing a paragraph that is word for word a copy of someone's else's paragraph, without you ever having any memory of said paragraph, except for a vague idea of how to turn a bunch of random words into comprehensible text. The chances are slim or next to mathematically impossible.

Furthermore, these models almost all dont have access to the internet, especially not chatgpt or gpt3. It's explicitly stated that the data cutoff is 2021, so it has not even been trained on newer articles.

The most likely explanation therefore is that CNET employees were really lazy or naive, and literally copy and pasted the other articles text into chatgpt or gpt3, and then wrote simple prompts for it such as "reword this for me". That's the true issue. I know that it's most likely the case because I've tried to reword text a few times with chatgpt, and sometimes it just doesn't manage to find a way to properly remix the text without making it sound too similar to the original. This only happens when I feed the text word for word, and I use a very lazy prompt. When I make a more complicated prompt, it's able to summarize the text and avoid copying it, just like a human would if they were asked to summarize a text.

So this is what's going on, not other things. Knowing reddit, even with this explanation it's unlikely that people are gonna believe me and will be unwilling to do their own research. If you wanna prove me wrong, here's a challenge. Make it generate an article about anything you like. Now copy and paste elements of that paragraph in Google search, and see how many exact results come up.

Shiningc t1_j5qibie wrote on January 24, 2023 at 9:25 PM

That doesn’t contradict his claim that “AI is just scraping existing writing”. Human intelligence doesn’t work in the same way. It’s just that at some point, humans know that something “makes sense” or “looks good”, even if it’s something that’s completely new, which is something that the current “AI” cannot do.

natepriv22 t1_j5qmutp wrote on January 24, 2023 at 9:53 PM

It does though...

It's not scraping writing, it's learning the nuances and rules and the probabilities of it in the same way a human would.

The equivalent example would be if a teacher tells you "write a compare and contrast paragraph about x topic". The process of using existing understanding, knowledge and experience is very similar on a general level to current LLM AIs. There's a reason they are called Neural Networks... who and what do you think they are modeled after currently?

Shiningc t1_j5qp1vn wrote on January 24, 2023 at 10:07 PM

“Comparing and contrasting paragraphs” has an extremely limited scope and it’s not a general intelligence.

An AI doesn’t know something “makes sense” or “looks good” because those are subjective experiences that we have yet to understand how it works. And what “makes sense” to us is a subjective experience where it has no guarantee that it actually does objectively make sense. What made sense to us 100 years ago may be complete nonsense today or tomorrow.

If 1000 humans are playing around with 1000 random generators, humans can eventually figure out what is “gibberish” and what might “make sense” or “sound good”.