iingot OP t1_j5ljnds wrote on January 23, 2023 at 9:28 PM

"The prominent tech news site CNET's attempt to pass off AI-written work keeps getting worse. First, the site was caught quietly publishing the machine learning-generated stories in the first place. Then the AI-generated content was found to be riddled with factual errors. Now, CNET's AI also appears to have been a serial plagiarist — of actual humans' work.

The site initially addressed widespread backlash to the bot-written articles by assuring readers that a human editor was carefully fact-checking them all prior to publication.

Afterward, though, Futurism found that a substantial number of errors had been slipping into the AI's published work. CNET, a titan of tech journalism that sold for $1.8 billion back in 2008, responded by issuing a formidable correction and slapping a warning on all the bot's prior work, alerting readers that the posts' content was under factual review. Days later, its parent company Red Ventures announced in a series of internal meetings that it was temporarily pausing the AI-generated articles at CNET and various other properties including Bankrate, at least until the storm of negative press died down."

Fake_William_Shatner t1_j5lx4tg wrote on January 23, 2023 at 10:56 PM

>at least until the storm of negative press died down."

Aah, you can just smell the integrity.

I can understand skipping some of the filler articles but -- incorrect data for a tech publisher?

DragoonXNucleon t1_j5mgnnt wrote on January 24, 2023 at 1:14 AM

There is no money for journalism until social media companies pay for stealing content. This article here is being stolen, with its entire content copied into this thread.

How can we expect them to pay journalists if we also force them to give their content for free? Reddit should have to pay per view when it clips content.

NotThatZachRoberts t1_j5n2dri wrote on January 24, 2023 at 3:57 AM

No one wants to pay subscriptions, no one wants to see ads, no one wants to subscribe to an email list even. I don't understand how people think good journalism happens.

UniversalMomentum t1_j5oq5r9 wrote on January 24, 2023 at 2:47 PM

I don't mind ads, but the kind of ads and the placement matters. They abused their advertising privileges and hurt their own brand reputation in the process. It was a foolish move! Similar regulations and content quality from TV should have made it to internet much faster, but many only half ass invested and their websites reflected the unprofessional and even dangerous state that kind of bad decision making produces. They get what they deserve on that one. Media has to earn it's reputation, not get special treatment.

Fake_William_Shatner t1_j5o84pp wrote on January 24, 2023 at 12:14 PM

When a mommy and a daddy journalist like each other a lot and one of them drinks heavily.

Alexander1899 t1_j5odxlf wrote on January 24, 2023 at 1:11 PM

Same thing with YouTube, and pretty much anything online.

UniversalMomentum t1_j5or3if wrote on January 24, 2023 at 2:53 PM

Well .. isn't that the same as TV has been for decades as the dominate media source? I don't think many people signed up for cable to watch TV news they were getting with their antenna for free/with ads. Cable just bundled news with it, so this business model where you either pay for no ads or get free content with ads has been around a long time now.. since radio and TV broadcasting came out. In that case the nature of the broadcasts made subscriptions too hard to pull off because all you needed is a receiver, but still the business model seems to have worked just fine for a long time. It might not produce the most integrity, but subscription only news means the majority of your citizens never sign up and get no news.. which in theory should be bad, but given the state of polarization might oddly work out better.

twbrn t1_j5ok4w0 wrote on January 24, 2023 at 2:02 PM

> How can we expect them to pay journalists if we also force them to give their content for free? Reddit should have to pay per view when it clips content.

Thing is though, it goes both ways. A lot of tech sites these days simply skim Reddit, Twitter, and other similar sites to produce "content."

Speaking as a former technology journalist, a lot of this goes back to the erosion of online advertising values that has been accelerating for 15 years. Websites put more ads on, and more obnoxious ads so that people are forced to see them. More ads means less revenue per ad, which means sites put on more ads, which means less revenue... All this adds up to them trying to balance the equation on the side of getting more clicks, and getting content FASTER than anyone else. Not necessarily better, just faster.

Where you used to have websites that did extensive, in-depth testing of devices, now you have somebody slapping together a handful of photos and a reworded press release and calling it a "review" of a new device. It doesn't matter that it's not good; it didn't take long to produce, and they don't care about the quality of their content. Likewise the spike in clickbait even among formerly respectable publications; if a site can get you to click on an article about "The shocking new feature included in all Samsung phones" or the like, it doesn't matter that it was a nothingburger or that it took them five minutes to put together. You already clicked and gave them their ad impressions. Or skimming some user quotes off Reddit and Twitter and giving it some snappy name like "Users trash latest Google service over massive flaws."

The problem comes down to, there's no easy way to fix this. I suppose you could try to build a select crowd that's willing to pay for quality journalism ala Patreon, but Google provides a massive engine to anyone who wants to throw their stuff out there for free. It's like a small, quality restaurant trying to compete with McDonalds. They might attract a following, but McDonalds is still going to represent 99% of the volume.

DragoonXNucleon t1_j5onjvc wrote on January 24, 2023 at 2:28 PM

There is an easy fix. Its revenue sharing.

If you display the totality of content, you either pay for it or take it down.

Back in the print days, if I just plagiarized your article and printed it in my own magazine you could sue me. Why has this changed? Well, Google writes the laws now.

In video media there are fair use laws. You can only use X seconds of a video before it becomes theft. Reddit, Facebook, Google are all selling other peoples content. Imagine if web site runners could set DNS record indicating a rev share PPV price. If a piece of content receives over X views that price kicks in and the owner would be liable.

Until we make laws to protect journalism, we won't have it.

twbrn t1_j5pawtn wrote on January 24, 2023 at 5:01 PM

In principle, it sounds good. The problem is that laws are made generally by people who have no idea about how technology works. And even when they do, they don't want to. We're still struggling with laws for something as basic as network neutrality.

There's also a lot of wiggle room that would make it difficult for a one-size-fits-all law to cover and, more importantly, enforce. You'd be looking at needing some kind of agency that actually made sure the rules were followed and settled disputes.

Maybe it could be done on a good faith basis, the way that groups like the Writers Guild of America arbitrate cases among members. If you could get Google and a few other big players on board, you might have a groundswell. But there's a lot of incentive for big tech companies not to want to stop the free circulation of content when the only people they're really hurting are writers and readers, not themselves.

M_Mich t1_j5oqyr0 wrote on January 24, 2023 at 2:52 PM

it’s why i canceled my apple news subscription. all the stories outside of cnn and bbc were websites creating “news” about reddit or twitter posts that had high controversy. and a number of articles that pointed to reddit then the reddit page linked to the article

JonOrangeElise t1_j5r00h6 wrote on January 24, 2023 at 11:19 PM

Well, Google claims to be making an attempt to penalize content farm journalism by rewarding EAT (expertise, authority, trust) signals and penalizing sites that toss shit together or (presumably) resort to AI. But the google bot is capricious and inconsistent and too often legitimate news sources get the short end of the algo update too. Curious: what career did you segue too after tech journalism?

twbrn t1_j5rmyxh wrote on January 25, 2023 at 2:00 AM

I'm glad that Google is at least trying. The problem with Google though is that they entrust making choices about content to an algorithm, and eventually people find ways to beat it. Like when they started measuring the time people spent on a site as an indicator of content quality, and sites started throwing a ton of fluff at you before they got to the actual information to prolong your visit. (If you've ever wondered why some sites/articles have a recap of the entire history of Samsung before some bit of news about the newest phone, or a long rambling personal story before a recipe for biscuit dough... that's a big reason why.) If there's a way to exploit the rules, people will find it. So I guess you could say I'm on the skeptical side to any kind of automated solution; machine learning only goes so far against human cleverness.

> Curious: what career did you segue too after tech journalism?

To be perfectly honest, I started taking entry level factory jobs to make ends meet. I'm currently looking for another of those, because I don't expect any of the copywriting jobs I've applied for to come through for me, nor any of the remote customer service stuff. So that, and hoping that my next novel meets some success.

MechanicalBengal t1_j5ojsi2 wrote on January 24, 2023 at 2:00 PM

Ok… Explain how youtubers like Coffeezilla, who do real journalism, exist.

Fake_William_Shatner t1_j5o82l7 wrote on January 24, 2023 at 12:14 PM

>How can we expect them to pay journalists if we also force them to give their content for free?

Nope.

But then, who will do investigative reporting? Fools! That's who.

Dohnakun t1_j5oobpr wrote on January 24, 2023 at 2:34 PM

> This article here is being stolen, with its entire content copied into this thread.

Fair use, how the internet works, free sample for advertising, yadda yadda.

> if we also force them to give their content for free

No one forces them. Newspapers formed the ad-ridden internet we have today. They knew that their traditional business model doesn't work here.

[deleted] t1_j5njou8 wrote on January 24, 2023 at 6:50 AM

[deleted]

iwasbatman t1_j5lv1y6 wrote on January 23, 2023 at 10:42 PM

If humans need to fact-check they might as well write the articles themselves.

UniversalMomentum t1_j5osd68 wrote on January 24, 2023 at 3:02 PM

There might be benefit to having humans write and AI fact check or AI write and humans fact check.. once AI is more than just a toy/sensation at least.

dentastic t1_j5oqk7j wrote on January 24, 2023 at 2:49 PM

Riddle me this tho: is it ever possible for an ai to not plagiarize? Everything they write has to come from something they've seen in their database.

I suppose the same could ve said for humans, but idk what counts in that regard

OneWithMath t1_j5p4rlc wrote on January 24, 2023 at 4:23 PM

>Everything they write has to come from something they've seen in their database.

The models learn ideas and concepts, they don't just copy text. Now, using an idea without giving credit is plagiarism, that makes it problematic that they can't cite sources for generated text. However they don't simply recreate prior sentences under normal conditions.

wants2helpuguyz t1_j5p5yvy wrote on January 24, 2023 at 4:30 PM

The editor is an unpaid intern with a minimum quota.

gerkletoss t1_j5m492s wrote on January 23, 2023 at 11:45 PM

How was the plagiarism measured though? And how does it compare to human-written articles?

[deleted] t1_j5njrfb wrote on January 24, 2023 at 6:50 AM

[deleted]

firem1ndr t1_j5lmpuf wrote on January 23, 2023 at 9:47 PM

yeah that’s basically how all these “ai” work - what nobody’s considering with all this ai writing stuff is that they’re scraping from existing writing, if everything becomes ai writing then it’s just a cascade of machine plagiarism, at some point in the process someone has to actually acquire knowledge and expertise and form opinions to write out an argument

greenappletree t1_j5lqou1 wrote on January 23, 2023 at 10:13 PM

This is going to be an interesting problem - just today I heard that chatGPT when as to code something was just basically scraping from GitHub. At what point does an AI infringe in copyright and who is responsible. Developers are just going to shrug and say the ai is a black box.

ciarenni t1_j5m0jn5 wrote on January 23, 2023 at 11:19 PM

> I heard that chatGPT when as to code something was just basically scraping from GitHub. At what point does an AI infringe in copyright and who is responsible.

Microsoft has already done this. Here's the short version.

A few years back, Microsoft bought GitHub. Repositories on GitHub have a license, specified by the author, stating how they can be used. These licenses range from "lol, do whatever, I don't care, but don't expect any support from me", to something akin to standard copyright.

Microsoft also creates Visual Studio, a program for writing code with lots of niceties to help people develop more efficiently and easily than writing code in notepad.exe. A recent version of Visual Studio had a feature called "co-pilot" which will basically read the half-built code you have and use some machine learning to offer suggestions.

Now then, as an exercise for the reader, knowing that Microsoft owns GitHub and also Visual Studio, where do you think they got the data to train that ML model? If you guessed "from the code on GitHub", you'd be right! And bonus points if you followed up with "but wait, surely they only used code they were allowed to based on the license specified?" Hint: No. It's literally plagiarism.

Nebuli2 t1_j5pwvqt wrote on January 24, 2023 at 7:15 PM

Yep, so they just let you know that they pass off any responsibility for infringing on licenses to you, the user.

Mgrecord t1_j5luylm wrote on January 23, 2023 at 10:41 PM

I believe there’s already a lawsuit against Dall-e and it’s use of copyrighted artwork.

LAwLzaWU1A t1_j5nns9q wrote on January 24, 2023 at 7:42 AM

Also worth pointing out that it's done by an organization that represent companies like Disney.

My guess is that massive companies like Disney are very interested in setting precedence that if their pictures are used for learning, they deserve payment. They will have their own datasets to train their AI on anyway, so they will still be able to use it.

These types of lawsuits will only serve to harm individuals and small companies, while giving massive companies a big advantage.

natepriv22 t1_j5ntkrw wrote on January 24, 2023 at 9:01 AM

No, they just want to use these models for their own profit, while making fan art generation or creation illegal.

They know they can't stop their pictures as used for learning, because they're publicly available. There's legal precedent for this.

What they care about is that you can generate an iron man style picture and post it online, without their licensing for such a character.

What's ironic is that this lawsuit will fail anyways, even with corporate backing, as I just mentioned, it can't generate any exact pictures, but only "style like" pictures.

Mgrecord t1_j5o513j wrote on January 24, 2023 at 11:39 AM

But isn’t the “style” or “essence” what’s actually copyrighted? I’m not sure Fair Use will cover this.

natepriv22 t1_j5o72qn wrote on January 24, 2023 at 12:03 PM

No, the final output is what's copyrighted. It's impossible to copyright style because it's too much of an abstract.

Example: Disney copyrights drawings of Mickey Mouse. Mickey mouse is a character that resembles a mouse, walks upright, has little mouse ears, has a boopy nose, red pants, and yellow shows.

This is a character, that Disney has come up with and which is unique. If someone were to draw something according to these exact specifications, then it is very likely that they would come up with a drawing closely or almost completely resembling Mickey Mouse. By trying to redistribute something so obviously similar, you are in danger of breaching someone's copyright.

On the other hand a style could be cartoons, or lets make it at the simplest level possible, drawing only with circles.

While you may have been the first to use a style, you have no copyright claim over it. It's a very abstract thing, but its more far removed from the artist. The style is a medium to produce a creation, it's more like a tool, but not the ultimate product. If you and Disney both started drawing with circles, you would ultimately come to very different products, no matter how similar the goal may be (draw a mouse using only circles).

In other words, styles are almost mathematical arrangements of colors, movements, dots, etc. You use this mathematical formula to produce a character for example. This character is unique, it's very likely only you could have come up with this. The style is very likely to be discovered by other people. Trying to copyright a style would be like trying to copyright a math formula.

TLDR: sorry for the messy writing, but I was trying to put all my thoughts together into one. For these reasons, AI can never truly plagiarize or infringe copyright on its own. Styles are non copyright able and that is almost exclusively what matters to the AI. Arranging math to try and satisfy your output desire. Unless it has a reference point it will pretty much never be able to come to the same conclusion you have come to.

Extra: imagine a world where style is copyrighted instead of just the product or output. It would be the destruction of creativity and art. Imagine if Disney was able to smartly copyright a cartoon or 3d cartoon style. They would be the only ones able to create cartoons and 3d cartoons in the industry, gatekeeping and locking everyone else out for risk of lawsuits.

Now that would be a true dystopia...

natepriv22 t1_j5o7jzl wrote on January 24, 2023 at 12:08 PM

Just to add:

If I made it really confusing by being all over the place:

Style = like math, discovered

Art product or output = like an idea, invented

Creativity combines the use of a style, to produce a product or output that expresses something. Without the product or output, what can a style express?

Imagine trying to explain Van Goghs style and styles without his product or output. It would be very mathematical and scientific = turbulent lines + bright colors + lowering of clarity filter

M_Mich t1_j5ork5f wrote on January 24, 2023 at 2:56 PM

and if expressionism could be copywrited, it wouldn’t have become a style of are. it would have been limited to the first artist to do it and then everyone else would have been sued.

Mgrecord t1_j5o8cl6 wrote on January 24, 2023 at 12:17 PM

Thanks for the thoughtful explanation. It will be interesting to see how this plays out. The technology is going to move much faster than the lawsuits!

Fafniiiir t1_j5wym5z wrote on January 26, 2023 at 3:14 AM

Human beings are not ai, I don't think that the two can just be compared.
A human being being influenced by another artist is not the same as an ai, and a human being can't copy another artist as accurately, broadly and quickly as an ai can.

Even if you practice Van Goghs work your entire life your work will never actually look like his there will always be noticeable differences.
There's a lot of artists who even do try to directly copy other artists styles and it's always very apparent and like a worse copycat.

The problem with ai too which is unique to it compared to humans is that it can be fed with an artists work and spit out finished illustrations in that style in seconds.
What is the point of hiring the artist who's work was input into the ai for it to learn from it?
The artist is essentially being competed out of their own work with no way of combating it or keeping up with it.
Not to mention that it also competes them out of their own search tag, when you search for some artists you literally get page after page of ai generations instead of the actual artists work.

Things like fair use take this stuff into consideration too, the damages or even potential damages caused to the person.
And ai is fundamentally different than humans in this regard, another human artist can never do what an ai and can't be judged the same.

natepriv22 t1_j5xrs8o wrote on January 26, 2023 at 8:01 AM

>Human beings are not ai, I don't think that the two can just be compared.

Absolutely they can be compared though, they are two forms of intelligence, one of those is built on the principles of intelligence of the other.

>A human being being influenced by another artist is not the same as an ai, and a human being can't copy another artist as accurately, broadly and quickly as an ai can.

It's not the exact same sure, but its broadly similar. You don't store 100% of the info you learn and see because it would be too much data. So you remember processes, rules, and outcomes much better, just like an AI would.

>Even if you practice Van Goghs work your entire life your work will never actually look like his there will always be noticeable differences. There's a lot of artists who even do try to directly copy other artists styles and it's always very apparent and like a worse copycat.

I mean, the average person and I'm pretty sure the both of us too would not be able to distinguish the original from the copied one, unless we had more info. You can do a simple test online, and let's see if you manage to distinguish the two. If you do get a high score, then congrats! You are better at spotting copied art than the average human is.

Furthermore, what you're describing is exactly how AI works. Unless you use an Img2Img model, which is not what the majority of AI art is, then you would never, or it would be close to impossible for you to produce the same output, just like a human. Again, you could test this right now. Just go on an AI art app like Midjourney or Stable Diffusion, and type in "Van Gogh Starry Night", let's see what outputs you will get out of this.

>it can be fed with an artists work and spit out finished illustrations in that style in seconds.

First of all not exaclty, as I've said before, the model never contains the original input, so it's only learning the process, like a human.

Second of all, you can do the same thing! It'll just take you more time. Your friend gives you 100 pictures of a new art style called "circly" which is art purely made with circles. He will give you days, weeks or months, however much you need, to output something in this new style. He wants a picture of New York only made with circles. So you learn this style and create the new drawing or painting for him. You did almost the exact same thing an AI did, except it took you longer which is normal as a human being.

>What is the point of hiring the artist who's work was input into the ai for it to learn from it?

What is the point of hiring a horse carriage driver, when the concept of how a carriage works, was used to create the "evil car"?

First this is a loaded and emotional question. All kinds of art was used without discrimination, no one was specially selected.

Secondly, again, the model will not be able to output the same thing. It can draw in the same style, but the output will not be the same, it just mathematically won't be. So there is economic value in the original work too.

If a process or job can be automated, and there can be a benefit for humanity, why should we stop this development. Where were you when the horse carriage was being replaced? Where are you, fast food workers are getting automated too?? Why is it ok for others but not for you? And if it's ok for no one, do you think we should regress and go back in the past?

>Not to mention that it also competes them out of their own search tag,

I literally have never met a person who searches the art from someone outside of their official channels. Even if they do, then that's a marketing challenge. But what's the difference with popular artists that were being flooded with copies from fiver then?

A style is copyrightable btw, and thank gosh for that. So if they're getting flooded with "copies of their style" that's a lie. It's not their style, it's the style they use and maybe even discovered. But they have no copyright claim. Imagine a world where Disney could copyright drawing cartoonish styles... or DC comic styles... is that what you want?

LAwLzaWU1A t1_j5p7pc7 wrote on January 24, 2023 at 4:41 PM

Making it illegal to use pictures for learning, even if publicly available, is exactly what the lawsuits are about, and a huge portion of people (mainly artists who have had their art used for learning) support this idea.

It's in my opinion very stupid, but that's what a lot of people are asking for without even realizing the consequences if such a system was put in place (not that it can be to begin with).

Fafniiiir t1_j5wxvop wrote on January 26, 2023 at 3:08 AM

This isn't really true at all, artists don't have a problem with art being used to teach ai so long as it's consensual and artists get compensated for it.

LAwLzaWU1A t1_j5xtymw wrote on January 26, 2023 at 8:30 AM

And the consequence of that is that Disney could say that artists who used Disney works to learn how to draw without consent owe them royalties. I don't think that is what is going to happen, but logically that is the implication.

If you go through some of the lawsuits being done regarding AI you will see that what they are arguing is not exclusive to AI art tools. For example, the lawsuit from Getty seems to just states that it should be considered illegal to "use the intellectual property of others - absent permission or consideration - to build a commercial offering of their own financial benefit".

That wording applies to human artists as well, not just AI. Did you use someone else's intellectual property to build a financial offering, such as artists on fiverr advertising that they will "draw X in the style of Disney"? Then you might be affected by the outcome of this lawsuit, even if you don't use AI art tools. Hell, does your drawings draw inspiration from Disney? Then you have most likely used Disney as "training data" for your own craft as well and it could therefore be argued that these rulings apply to you as well.

I understand that artists are mainly focused on AI tools, but since an AI tool in many ways functions like a human (see publicly available data and learns from it), these lawsuits could affect human artists too.

And like I said earlier, the small artists who are worried that big companies might use AI tools instead of recruiting them are completely missing the mark with these lawsuits, because the big companies will be able to afford to buy and train on their own datasets. Disney have no problem getting the legal right to train their future AI on whichever data they want. These lawsuits will only harm individuals and small companies by making it harder for them to match the AI capabilities of big companies.

It is my firm belief that these tools have to be as open and free to use by anyone as possible, in order to ensure that massive companies don't get an even bigger advantage over everyone else. At the end of the day, the big companies currently suing companies like StabilityAI are doing so for their own personal gains. Getty images don't want people to be able to generate their own "stock images" because that's their entire business. Disney doesn't want the average Joe to be able to recreate their characters and movies with ease. They want to keep that ability to themselves.

Fafniiiir t1_j5wxknu wrote on January 26, 2023 at 3:06 AM

>There's legal precedent for this.

I think that people are getting ahead of themselves when making these claims, this is very new legal issues.
Context matters a lot here, and laws adapt to new technology or context all the time.

natepriv22 t1_j5xpjyp wrote on January 26, 2023 at 7:32 AM

No there's actual legal precedent which shows this.

https://en.m.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,_Inc.

Fafniiiir t1_j5wxb0j wrote on January 26, 2023 at 3:04 AM

Imo I think it's a really creepy and worrisome precedent to set that they can just scrape everything they want.
A lot of very bad stuff has been found in these dataset including cp and stuff like isis footage, revenge porn and leaked nudes etc.
Even on a less horrifying note, it's also peoples personal photographs, medical records and before and after photos of operations, peoples private vacations, family photos, id's etc you get the idea.

I do find it a bit worrisome if they can just scrape everything they want online and use it for commercial purposes like this.
At least Disney using their own copyrighted work to train an ai wouldn't run into these ethical problems.

LAwLzaWU1A t1_j5xvmyt wrote on January 26, 2023 at 8:53 AM

I genuinely do not understand why you find that creepy and worrisome. We have allowed humans to do the exact same thing since the beginning of art, yet it seems like it is only an issue when an AI does it. Is it just that people have been unaware of it before and now that people realize how the world works they react to it?

If you have ever commissioned an artist to draw something for you, would you suddenly find it creepy and worrisome if you knew that said artist had once seen an ISIS video on the news? Because seeing that ISIS video on the news did alter how the artist's brain was wired, and could potentially have influenced how they drew your picture in some way (maybe a lot, maybe just 0,0001%, depending on what picture you asked them to draw).

The general advice is that if you don't want someone to see your private vacation photos, don't upload them to public websites for everyone to see. These training data sets like LAION did not hack into peoples' phones and steal the pictures. The pictures ended up in LAION because they were posted to the public web where anyone could see them. This advice was true before AI tools were invented, and it will be true in the future as well. If you don't want someone to see your picture then don't post it on the public web.

Also, there would be ethical problems even if we limited this to just massive corporations. I mean, first of all, it's ridiculous to say "we should limit this technology to massive corporations because they will behave ethically". I mean, come on.

But secondly and more importantly, about companies that don't produce their own content to train their AI on, but rather would rely on user submitted content? If Facebook and Instagram included a clause that said that they were allowed to train their AI models on images submitted, do you think people would stop using Facebook? Hell, for all I know they might already have a clause allowing them to do this. I doubt many people are actually aware of what they allow or don't allow in the terms of service they agree to when signing up for websites.

Edit:

It is also important to understand the amount of data that goes into these models and data sets. LAION-5B consists of 5,85 million images. That is a number so large that it is near impossible for a human to even comprehend it. Here is a good quick and easy visualization of one what one billion is. And here is a longer and more stark visualization because the first video actually uses 100,000 dollars as the "base unit", which by itself is almost too big for humans to comprehend.

Even if someone were to find 1 million images of revenge porn or whatever in the dataset, that's still just 0.02% of the data set, which in and of itself is not the same as 0.02% of the final model produced by the training. We're talking about a million images maybe affecting the output by 0.02%.

How much inspiration does a human draw from the works they have seen? Do we give humans a pass just because we can't quantify how much influence a human artist drew from any particular thing they have seen and experienced?

I also think the scale of these data sets brings up another point. What would a proposed royalty structure even look like? Does an artist which had 100 of their images included in the data set get 100/5,000,000,000 of a dollar (0.000002% of a dollar)? That also assumes that their works actually contributed to the final model in an amount that matches the portion of images in the data set. LAION-5B is 240TB large, and a model trained on it would be ~4GB. 99.99833% of all data is removed when transforming from training data to data model.

How to we accurately calculate the amount of influence you had on the final model which is 0.001% the size of the data set, of which you contributed 0.000002% to? Not to mention that these AIs might create internal models within themselves, which would further diminish the percentages.

Are you owed 0.000002% of 0.001%? And that also assumes that the user of the program accounts for none of the contributions either.

It's utterly ridiculous. These things are being discussed by people who have no understanding of how any of it works, and it really shows.

DigitalSteven1 t1_j5nf5ox wrote on January 24, 2023 at 5:58 AM

As a developer, I couldn't care less that my github was scraped. I also don't care that github made copilot. It's a tool for us. I've been developing for years and copilot has made my life work flow significantly better. Ask the wide developer community, and you'll find very similar results. We just don't care that much that our work patterns may repeat in other code somewhere.

According to GitHub's own surveys:

88% reported being more productive
>90% reported being faster at their job
60-75% of users reported feeling more fulfilled and less frustrated
73% reported it helped them "stay in the flow"
87% reported it preserved mental effort from repetitive tasks

Source: https://github.blog/2022-09-07-research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/

But if you really want to know, just go to some developer subreddit and ask...

Fake_William_Shatner t1_j5lwqbw wrote on January 23, 2023 at 10:53 PM

No -- in the case of code, it's not "distilling a style" -- it's grabbing whole routines of code that someone wrote with certain attribution and copy restrictions that I think GPT and some "code AI" are breaking.

There's no point in breaking up an entire function -- so it is probably more like automated cut and paste.

Imnot_your_buddy_guy t1_j5mr8nz wrote on January 24, 2023 at 2:31 AM

Shouldn’t we be demanding that the majority of these AI be free considering their companies just steal from our shared knowledge?

greenappletree t1_j5mu4ot wrote on January 24, 2023 at 2:52 AM

This is an interesting point but to play devil advocate couldn’t the same be said about a person who is learning from all these material for free, assimilated it and made it their own?

Key-Passenger-2020 t1_j5myi30 wrote on January 24, 2023 at 3:26 AM

It depends on how that code is licensed. Much of it exists under the GNU Public License

GTREast t1_j5o2cvh wrote on January 24, 2023 at 11:06 AM

The AI base of content itself will grow and become a part of the information landscape, in a kind of feedback loop. This is going to get interesting.

ViennettaLurker t1_j5os9nt wrote on January 24, 2023 at 3:01 PM

> At what point does an AI infringe in copyright and who is responsible

Theres the philosophical answer, and the real world answer. We could talk theory all day, but this will all shake out when one gigantic corporation sues another gigantic corporation over it.

ericisshort t1_j5ltakd wrote on January 23, 2023 at 10:30 PM

You’re probably right, but I really don’t think that shrug will hold up in court though.

natepriv22 t1_j5ntbw0 wrote on January 24, 2023 at 8:58 AM

No that's basically how none of these AIs work. You don't understand how machine learning works. Please stop spreading misinformation and do some research first.

If the AI is plagiarizing then so are you in writing your comment, as you sure as heck didn't just learn to write out of the blue.

The model never contains the original text, can you imagine how huge that would be? Nobody would be able to run it and def nobody would have enough money to access it. The model uses a noise and denoising algorithm, and a discriminator algorithm to make sure the output is the most likely correct output.

So its literally not possible for it to commit plagiarism because it doesn't contain the og text. For it to be accidental plagiarism, it would have to accidentally generate the exact same output, with no memory of the original input, except for an idea of turning noise into comprehensible text.

To put it in other words, that would be like you writing a paragraph that is word for word a copy of someone's else's paragraph, without you ever having any memory of said paragraph, except for a vague idea of how to turn a bunch of random words into comprehensible text. The chances are slim or next to mathematically impossible.

Furthermore, these models almost all dont have access to the internet, especially not chatgpt or gpt3. It's explicitly stated that the data cutoff is 2021, so it has not even been trained on newer articles.

The most likely explanation therefore is that CNET employees were really lazy or naive, and literally copy and pasted the other articles text into chatgpt or gpt3, and then wrote simple prompts for it such as "reword this for me". That's the true issue. I know that it's most likely the case because I've tried to reword text a few times with chatgpt, and sometimes it just doesn't manage to find a way to properly remix the text without making it sound too similar to the original. This only happens when I feed the text word for word, and I use a very lazy prompt. When I make a more complicated prompt, it's able to summarize the text and avoid copying it, just like a human would if they were asked to summarize a text.

So this is what's going on, not other things. Knowing reddit, even with this explanation it's unlikely that people are gonna believe me and will be unwilling to do their own research. If you wanna prove me wrong, here's a challenge. Make it generate an article about anything you like. Now copy and paste elements of that paragraph in Google search, and see how many exact results come up.

Shiningc t1_j5qibie wrote on January 24, 2023 at 9:25 PM

That doesn’t contradict his claim that “AI is just scraping existing writing”. Human intelligence doesn’t work in the same way. It’s just that at some point, humans know that something “makes sense” or “looks good”, even if it’s something that’s completely new, which is something that the current “AI” cannot do.

natepriv22 t1_j5qmutp wrote on January 24, 2023 at 9:53 PM

It does though...

It's not scraping writing, it's learning the nuances and rules and the probabilities of it in the same way a human would.

The equivalent example would be if a teacher tells you "write a compare and contrast paragraph about x topic". The process of using existing understanding, knowledge and experience is very similar on a general level to current LLM AIs. There's a reason they are called Neural Networks... who and what do you think they are modeled after currently?

Shiningc t1_j5qp1vn wrote on January 24, 2023 at 10:07 PM

“Comparing and contrasting paragraphs” has an extremely limited scope and it’s not a general intelligence.

An AI doesn’t know something “makes sense” or “looks good” because those are subjective experiences that we have yet to understand how it works. And what “makes sense” to us is a subjective experience where it has no guarantee that it actually does objectively make sense. What made sense to us 100 years ago may be complete nonsense today or tomorrow.

If 1000 humans are playing around with 1000 random generators, humans can eventually figure out what is “gibberish” and what might “make sense” or “sound good”.

Shiningc t1_j5nshxm wrote on January 24, 2023 at 8:46 AM

"But but but that's how human intelligence works!"

UniversalMomentum t1_j5ou5yb wrote on January 24, 2023 at 3:14 PM

Yeah.. but the machine learning is only just getting useful so you're kind of projecting the limitations of a new tech long term as if the tech won't be changing and it probably will change and it probably will be able to go well beyond just combining pre-made content.

That being said all human knowledge is plagiarized from the past, that's the inante foundational kind of process of science and knowledge. We aren't all supposed to figure everything out on our own so much as steal the success of the past as fast as possible and apply it somehow. You're not suppose to re-invent the wheel, you're supposed to copy it and find smart uses for it.

Sometimes 'acquiring knowledge' just means organizing the data so you see the patterns, in fact I'd say the majority of the time. AI is going to be pretty darn good at that and the limits we see now are rather expected vs you should project today's limitations decades into the future as if the tech will be standing still. People do that far too often, they speculate all the negatives and almost gleefully ignore the positives. It skews humans ability to long term project quite a lot.

QuestionableAI t1_j5lphpu wrote on January 23, 2023 at 10:05 PM

Sue the shit out of them each and every time ... I know I will.

Seriously, I have original works out there in articles and books and if I or anyone else finds my works being STOLEN for use by any MFer, my attorney has been wanting a new boat...

[deleted] t1_j5lwmk3 wrote on January 23, 2023 at 10:52 PM

[removed]

Spikemountain t1_j5mry80 wrote on January 24, 2023 at 2:36 AM

And don't forget a word in all caps for emphasis. Karen AI written all over it

QuestionableAI t1_j5mul1e wrote on January 24, 2023 at 2:56 AM

Learned it all on the intertubes..:)

Aleyla t1_j5lk7v8 wrote on January 23, 2023 at 9:32 PM

Um, yeah. They aren’t even close to the only ones.

Warpzit t1_j5o4841 wrote on January 24, 2023 at 11:29 AM

Which should be the big news if any. But why should the AI report on itself (jokes aside, journalism is a joke today).

johnnyb4llgame t1_j5mb1t2 wrote on January 24, 2023 at 12:33 AM

I remember when the stimulus checks were coming out during covid, almost every 12 hours CNET had some garbage article that was just regurgitating Twitter rumors.

BeowulfsGhost t1_j5ll6no wrote on January 23, 2023 at 9:38 PM

Yeah well, that’s how they learn isn’t it? Bad AI, no new processor for you!

Fake_William_Shatner t1_j5lx79g wrote on January 23, 2023 at 10:56 PM

"We won't be caught doing this exactly this way again."

Key-Passenger-2020 t1_j5mypbj wrote on January 24, 2023 at 3:28 AM

This isn't going to go away.

The question is: how can human beings use this for their own power and benefit instead of having their work and livelihoods stolen from them.

I'm not sure capitalism can address this problem.

maretus t1_j5nyitw wrote on January 24, 2023 at 10:13 AM

Literally every single example provided on this article is like a 1 or 2 sentence example.

Of course it sounds the same. There are only so many ways to say “this is how you avoid overdraft fees”.

Give me a fuxking break. That isn’t plagiarism. It’s running out of words to say the same thing. If they had used more than 2 sentences in their examples, I’m confident the 2 texts would have diverged more. But come on, of course 1 or 2 sentences about the same topic are going to sound/look the same.

I guarantee this shit passes copyscape which is what 99% of digital marketers use to check for duplicate/plagiarized content.

landyhill t1_j5o9l43 wrote on January 24, 2023 at 12:30 PM

This is how you avoid overdraft fees

You avoid overdraft fees this is how

Overdraft fees this is how you avoid!

Is this how you avoid overdraft fees.

Avoid fees overdraft how is this you?

Is overdraft fees you? Avoid this how?

Fees overdraft, you avoid, this is how!

forkofnature t1_j5on7pe wrote on January 24, 2023 at 2:25 PM

Avoid single overdraft fees in your area now!

Overdraft fees hate this one trick!

maretus t1_j5ooncl wrote on January 24, 2023 at 2:36 PM

So, is that all plagiarism? According to this article, it is.

landyhill t1_j5pewyc wrote on January 24, 2023 at 5:26 PM

IMHO it ignores the spirit of copyright or trademark which should be something as a whole that is truly unique. I recall Apple wanting to trademark rounded corners on phones or something similar. Or an artist trying to copyright chord progressions which on their own do very little.

AI images, coding and writing is impacting people who feel what they do has some intrinsic value that only they can provide using a specific process they utilize.

We all build on our experiences of what we hear, read, see, touch, etc. I think the term "create" is misleading. Humans build using physical resources here long before our individual existence. Outside of the occasional meteor everything is basically recycled into a new form.

AI is coming for the mental aspect of humanity reducing us to ones and zeros and our egos may have a hard time adjusting.

maretus t1_j5qj5o5 wrote on January 24, 2023 at 9:31 PM

I’m a digital marketer and have spent 20+ years writing content and in a lot of cases “rewriting content” to avoid duplicate content filters.

I’d venture a guess than 95%+ or commercial content on the internet is just rehash on rehash on rehash. But is it plagiarism to write the same thing using completely different words?

landyhill t1_j5rdgcq wrote on January 25, 2023 at 12:51 AM

In general I agree with you. Imagine someone claiming plagiarism for the terms like "Click here for more information" or "Stock market reaches all time high".

[deleted] t1_j5n5d3l wrote on January 24, 2023 at 4:23 AM

[deleted]

[deleted] t1_j5q9e8t wrote on January 24, 2023 at 8:31 PM

[deleted]

theculinaryportfolio t1_j5n542b wrote on January 24, 2023 at 4:20 AM

Who is out there making these stock photos?? Lmfao

AndroidDoctorr t1_j5nnd3s wrote on January 24, 2023 at 7:36 AM

When are people going to learn that ml is just plagiarism with extra steps?

Jarvis_The_Dense t1_j5oyu8q wrote on January 24, 2023 at 3:45 PM

As to be expected when AIs litterally only function by studying and reincorporating pre existing works.

QB8Young t1_j5p1xbd wrote on January 24, 2023 at 4:05 PM

I really wish people would stop using images of robots when writing articles about simple AI. 🤦‍♂️

Also, of course it committed extensive plagiarism, it gathers known information and compiles it.

the-grim t1_j5p2q0x wrote on January 24, 2023 at 4:10 PM

"ChatGPT, rewrite this article using different wording"

I am a journalist

mindfu t1_j5rkao7 wrote on January 25, 2023 at 1:40 AM

CNET turned sketchy very early on. I remember an abrupt shift in the late 90s where they went from being a good place that you could find and download good Windows software, to a place that would literally infect your computer with additional programs you hadn't asked for.

[deleted] t1_j5lnqbp wrote on January 23, 2023 at 9:54 PM

[removed]

Calm-Campaign-5252 t1_j5lxa2j wrote on January 23, 2023 at 10:57 PM

So it perfectly mimics actual journalists... the future IS now.

[deleted] t1_j5m3xno wrote on January 23, 2023 at 11:42 PM

[removed]

[deleted] t1_j5m95f6 wrote on January 24, 2023 at 12:19 AM

[removed]

[deleted] t1_j5mn2ub wrote on January 24, 2023 at 2:01 AM

[removed]

[deleted] t1_j5mtxi6 wrote on January 24, 2023 at 2:51 AM

[removed]

nirad t1_j5n1dqb wrote on January 24, 2023 at 3:49 AM

If you target niches areas, the bot is probably going to copy and lightly reword other works in the same area.

[deleted] t1_j5nfwl6 wrote on January 24, 2023 at 6:06 AM

[removed]

Dic3dCarrots t1_j5ngmxd wrote on January 24, 2023 at 6:14 AM

Ai requires a basic shift in human responsibility. Information has been chaos for so long. True Order is being to knowledge. Never again must an Aristotle be lost to ignorance. We now get to tap these vast stores of information, but only if we learn how oranganic intelligence successfully creates feedback loops. People have to learn to channel this power. ChatGPT makes busy work unnecessary. Kids work the same way animals do. You need to gamify things. Forcing people to subdue their nature doesn't provide backbone. It takes away from the available mental resources. Kids should be taught with technology to use technology.

Secunda_Son t1_j5nhici wrote on January 24, 2023 at 6:24 AM

Unfortunately this is the future though. All of these articles about catching AI plagiarizing content and being detected by algorithms don't end with anyone saying "oh, I guess we shouldn't do this then". They end with the people behind the AI going "huh, guess we need to teach it to hide its plagiarism better". We can't stop this so we need to start planning for it.

[deleted] t1_j5nnxj4 wrote on January 24, 2023 at 7:44 AM

[removed]

Shiningc t1_j5nswuj wrote on January 24, 2023 at 8:52 AM

Who is willing to bet that the AI hype is going to die down once people realize that the "AI" (machine learning) is basically just shit like this?

Ehgadsman t1_j5nv8hp wrote on January 24, 2023 at 9:25 AM

when all the humans are replaced, how will so called AI function? From the art bots to text bots to every other kind of stupid fucking bot, they all require a huge population of humans doing actual work to copy from, when those humans have all been replaced and can no longer earn a living, what then? fuck this greedy fucking bullshit, and fuck the assholes enabling it. This is some next level fuck around and find out.

[deleted] t1_j5nyziz wrote on January 24, 2023 at 10:19 AM

[deleted]

GI_X_JACK t1_j5o1tq7 wrote on January 24, 2023 at 10:59 AM

Plagiarism is exactly how AI chat bots work. All of them.

You input text, it recombines it, and then outputs it with a certain degree of mutation.

s0cdev t1_j5o9b6j wrote on January 24, 2023 at 12:27 PM

I see cnet has come a long way from installing adware on your pc if you download popular software from them.

Fuck that trash ass excuse for a tech news site

[deleted] t1_j5obzj7 wrote on January 24, 2023 at 12:53 PM

[removed]

[deleted] t1_j5oqh7p wrote on January 24, 2023 at 2:49 PM

[removed]

[deleted] t1_j5p0la4 wrote on January 24, 2023 at 3:56 PM

[removed]

gordonjames62 t1_j5pfbhs wrote on January 24, 2023 at 5:28 PM

so AI is about like a high schooler at figuring what to write, but with a more expansive library of sources to cite / plagiarize.

amitym t1_j5q4qwm wrote on January 24, 2023 at 8:03 PM

The entire premise is that the AI plagiarizes and just repeats popular beliefs, right? Wtf is even going on over there at CNET? What are they thinking?

... I mean I guess if that's all your human journalists were doing all along, maybe they are right in the end... >_>

magenta_placenta t1_j5ufpo3 wrote on January 25, 2023 at 5:23 PM

What people call AI today is in fact not AI. It is a model that has been trained on input data, and generates prompted output based on that input. In other words:

They are automated plagiarism machines.

That's literally how this "AI" (autoregressive language model) works. The logic is essentially: A little plagiarism is plagiarism, a lot of plagiarism is original works. So if it is detected, then they need to increase their level of plagiarism until it isn't.

eatingganesha t1_j5lp6ke wrote on January 23, 2023 at 10:03 PM

Well, of course it’s going to “plagiarize”… it’s not omniscient. Lol

The factual errors are probably due to the sheer amount of disinformation out there. Sounds like it had choices and pulled the incorrect options. Whatever it is, that is down to programming or machine learning.

ts0000 t1_j5mi1fy wrote on January 24, 2023 at 1:24 AM

Plagiarism means it's copied, not that it's incorrect.

IThinkIWont t1_j5mleit wrote on January 24, 2023 at 1:49 AM

Two separate ideas, separated by two return carriages.

I assume, for the first point, the OP was alluding to the fact that everybody uses somebody else's writing to formulate opinions. Very few people have first hand accounts of current events or history.

[deleted] t1_j5m04kv wrote on January 23, 2023 at 11:16 PM

[deleted]

tangcameo t1_j5mpxv0 wrote on January 24, 2023 at 2:22 AM

I want it to write essays on how to destroy AI’s and never revive them.

Ok-Equivalent-8509 t1_j5n9u60 wrote on January 24, 2023 at 5:04 AM

How can a non sentient AI commit a crime? You mean the programmer commited plagiarism via an AI

Comments