Viewing a single comment thread. View all comments

SeaweedSorcerer t1_j6g7zhr wrote

This case asks the opposite question: can you freely use other people’s copy written content to train your AI?

31

ostrichpickle t1_j6gbyj0 wrote

If A.I. can't use others copyrighted work to learn and train, why can people?

People do the same thing, learn off others and emulate other artists to learn. So does that make their art invalid to?

7

josefx t1_j6h1xav wrote

At least Microsoft copilot has been caught reproducing large sections of code verbatim. Try selling a book that contains copies of Disney products and see how that turns out.

16

Ronny_Jotten t1_j6hx2hb wrote

> If A.I. can't use others copyrighted work to learn and train, why can people?

But it is allowed to use copyrighted works to train an AI - as long as it constitutes fair use. What's probably not fair use though, is to sell or flood the market with cheap works produced by a machine, if it negatively impacts the market for the original works it's trained on. Copyright laws make a distinction between humans and machines, because they're not the same thing. For example, works created solely by non-humans, whether a machine or a monkey, can't be copyrighted. According to the US copyright office, it requires "the nexus between the human mind and creative expression".

8

SeaweedSorcerer t1_j6gf8j5 wrote

One reason is that AI training is done by copying the training data to hundreds or even thousands of training nodes. It’s near to creating a book of every painting and giving that book to every person learning art without compensating or even crediting the artists who have art in that book.

Another reason is trained AIs have inhuman memories and their models spit out the original art, in some cases near verbatim. You can look at it as compressing the data. Usually highly lossy compression but not always. And courts have shown it is clearly piracy to copy differently compressed movies/music/etc.

3

CallFromMargin t1_j6gmmbk wrote

Well, that's a whole load of bullshit.

13

IAmDrNoLife t1_j6gxfuq wrote

Exactly, because it's not true.

Machine Learning (or rather, Deep Learning and Neural Networks) do not "compress the data". They analyse data. They don't store any original art used in the training (otherwise, the size of these models would be in the thousands of terabytes. Instead we see them being a few gigabytes).

Furthermore, these models do not replicate the art it has been trained on. Every single piece of art generated by AI, is something entirely new. Something that has never been seen before. You can debate if it takes skill, but you can't debate that it's something new.

This video is an excellent source of information regarding this topic. It's created by a professional artist who has embraced AI generated art as a source of inspiration and to speeding up their own work.

Even furthermore, courts have indeed shown previously that Google IS allowed to data mine a bunch of data, and use this. Google has their "Google Books", which is a record of an enormous amount of books, which has been done via data mining - of course, there's a difference between the Google Books project and AI art models, due to the end result (one is a collection of existing stuff, and the other is one that can create new stuff). But the focus here was on the data mining.

One thing that a lot of people don't seem to know: You do not own a style. You cannot copyright a style. There have been a lot of artists that complain because "it's possible for people to just mimic my work". But yes, that is true, but it has always been true - simply because you do not own "your" style. People have always been able to go to another person and say "please make some art, in the style of this person". You have copyright for individual piece of art, but not the general style that you use to create said art.

Here comes my own personal opinion:

Tools using AI are the future. People are not going to lose their jobs because an AI makes them obsolete - people are going to lose their jobs if they refuse to use AI to improve their workload.

Take software development. These models can generate code from the bottom to an insane degree of detail. You no longer have to spend time on all the boring stuff, actually writing the code, you can focus on the problemsolving. The same goes for art: with AI tools, you get to skip the boring monotonous part of your workload, and you can focus on the parts that actually mean something.

4

CallFromMargin t1_j6gxzgp wrote

The "they re-create art" argument comes from a paper that is widely shared on Reddit. Thing is, that paper itself mentions that the researchers trained their own models on small data sized, ranging from 300 pictures to few thousand, and they started seeing novel results at 1000 images.

​

Also current bots can't generate good code, not yet, but they have their own usage. As an example, a client I recently had asked me to design patching system (small shop, with 100 or so servers, they had no use for automated patching up to now), and some simple automation. You know, the type of weekend jobs you do to earn some extra cash. Well, since they are using azure, I went with azure automation, but I had no idea how it works. Well, chatGPT told me how it works, in details, gave me some code that might work, etc. But the most important thing by far was the high level overview, it saved me hours of reading documentation. This shit is the future, but not how you might expect it to be.

9

Ronny_Jotten t1_j6i3uog wrote

I don't know what paper you're referring to, but there's this one:

Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models

It clearly shows, at the top of the first page, the full Stable Diffusion model, trained on billions of LAION images, replicating images that are clearly "substantially similar" copyright violations of its training data. The paper cites several other papers regarding the ability of large models to memorize their inputs.

It may be possible to tweak the generation algorithm to no longer output such similar images, but it's clear that they are still present in the trained model network.

3

Mr_ToDo t1_j6j481z wrote

Well, they did both in that paper. But it would be interesting to know what the ones at the top were from. I know that there's one I saw further down in high hit percents further down but with as nice as they are I don't know why the rest don't if they belong to that model.

−1

Ronny_Jotten t1_j6kjrlv wrote

The paper explains what the ones at the top were from. It's using Stable Diffusion 1.4. See page 7: Case Study: Stable Diffusion, page 14: C. Stable Diffusion settings, and page 15 for the prompts and match captions. Sorry, the rest of your comment is incomprehensible to me...

2

Mr_ToDo t1_j6mwtay wrote

OK that's on me. I hit the references and somehow thought I was done with the paper, I didn't think they would have the captions they used underneath that. I admit that was on my bad due diligence. Apologies

1

Ronny_Jotten t1_j6hpnnj wrote

> They don't store any original art used in the training [...] these models do not replicate the art it has been trained on. Every single piece of art generated by AI, is something entirely new. Something that has never been seen before. You can debate if it takes skill, but you can't debate that it's something new

They can very easily reproduce images and text that are substantially similar to the training input, to the extent that it is clearly a copyright violation.

Image-generating AI can copy and paste from training data, raising IP concerns | TechCrunch

> courts have indeed shown previously that Google IS allowed to data mine a bunch of data [...] there's a difference [...] But the focus here was on the data mining.

In the case of the Google Books search product, the scanning of copyrighted works ("data mining") was found to be fair use. That absoutely does not mean that all data mining is fair use. Importantly, it was found that it had no economic impact on the market for the actual books, it did not replace the books. In order for the code/text/image AI generators' "data mining" of copyrighted works to be fair use, it will also have to meet that test. Otherwise, the mining is a copyright violation.

5

BastardStoleMyName t1_j6iy5c3 wrote

This is the debate of human vs computational divid at the very beginning. There are few ways to have this debate without it being philosophical.

There is not a human that is able to analyze and retain data the same way a computer can. Human memory is flawed and made efficient. When we view something, we don’t download it or literally transfer data to ourselves. Every part of the experience is an interpretation from external to internal.

As of this point a copy of an image, that would fall under copyright, has to be transferred to a system, to then be interpreted with a process that dictates how many samples to take of an object.

These systems can’t accept usage terms itself to view a file or an artwork and isn’t being brought to a gallery with the approval of the owners to view and scan the images itself. If people were paid to create images with the style of someone else, they are pulling from their brains interpretation and flawed, by nature, memory storage to interpret that.

This copyright case is honestly one of the first major stepping stones and will be a reference to how we classify AI in the future and a precedent for how we legally allow it’s use. Which is something we will have to face one day, just like every SciFi novel has warned us. But how and when that determination and at what stage we decide that is going to be important. At this stage I would say if the system cannot be legally accept the usage terms to an image, then it isn’t allowed to use those images in any manner.

From a current legal standpoint, we have currently decided that AI does not have any right to claim copyright on what it creates, and the AI creator has no right to claim the output. Then following that thought, it is not in a position to be able to use copyright covered material as the owner cannot accept the terms on the AI’s behalf and the AI cannot accept them in its own. This has been decided in reverse already.

Further it’s my opinion that AI should be restricted to single tasks and segmented. If an AI creates writing prompts, then that’s all it can do and all it can be fed with, it an AI writes code, then that’s all it should return and all it can be trained on.

For a point of future reference. It’s not about what determination gets made for AI in the long run, but how we are prepared to use an understand it now. AI created now is purely a tool for operator and consumer use.

2

lethal_moustache t1_j6glgvi wrote

The art isn't invalid. It may, however, infringe copyright and make the artist subject to damages.

1

ostrichpickle t1_j6gm2fr wrote

Every artist ever... learnt off other artists.. so.....

4

techimp t1_j6gogaa wrote

While it may be true that new artists learn from the old, there is something intrinsically different in an homage, a cover and a new original work. 2 of those are allowed for artists without restrictions, the last (cover) has specific rules on how the copyright is handled (recording the work is one of those items the band can't do, but in theory a fan could). AI does not distinguish this. It's rough approximation of an answer often has either not enough originality or something in uncanny valley territory or weirdness.

That's what is being debated. It IS a conversation worth having since laws will always on the back foot in regards to tech, privacy and rights.

4

AuthorNathanHGreen t1_j6gg5hf wrote

When I posted a story online for free I did so because I thought real humans could read it, and perhaps decide they wanted to buy my longer works if they liked it. I understood that someone might read it and not like it, like it but be too cheap to buy paid work, or perhaps read it and use it to study writing techniques I used. I did not however post it thinking an AI might be training itself (with no hope of me getting compensation out of the deal) so that it could further dilute the market for writing.

Don't I have a right that my content not be used in a manner I couldn't anticipate or prevent?

3

CallFromMargin t1_j6glixm wrote

In that specific case, no. Fair use laws cover that, and Google vs author guild had solved that specific case in court. Using your work falls under fair use, just like human reading your work and incorporating ideas in his/her own work.

That said, if you wrote shit in internet, let me assure you, it is almost useless for training writing AI. Believe me, I tried to do it on dataset of /r/writingprompts, the thing is that most writing there just sucks, which is not bad, as the only way of learning to write is by writing, thus putting bad work on the internet. It doesn't change the fact that it objectively sucks.

If I wanted to write an actual writing AI I would use a collection of classical works, works that stood the test of time, and frankly, the difference between those and what is put on internet is often in how scenes and characters are flushed out.

2

Ronny_Jotten t1_j6hrjni wrote

> In that specific case, no. Fair use laws cover that, and Google vs author guild had solved that specific case in court. Using your work falls under fair use, just like human reading your work and incorporating ideas in his/her own work.

That's completely false. The Google case was found to be fair use, precisely because it did not "dilute the market for writing". That's one of the four legal tests for fair use. The judge said that it did not produce anything that competed economicially in the market for the books that it scanned; on the contrary it might increase their sales. Whether such scanning is fair use, is determined on a case-by-case basis. If AIs are being used to produce "new" works that are sold commercially and undercut the authors of the originals that it's based on, it will be much more difficult to prove fair use.

Furthermore, the Copilot product creates a loophole where businesses can incorporate code released under e.g. a GPL license that requires said business to release its deriviative works under the same open-source license, and make it closed-source instead. That can also create an unfair economic advantage in the market. These questions are far from "solved".

2

Doingitwronf t1_j6gpxd7 wrote

I wonder what happens now that Ais can be instructed to produce works in the specific style of any author/artist who's works were supplied to the training set?

1

CallFromMargin t1_j6gwqup wrote

What used to happen when you asked for a painting in style of X? The same thing is happening with AI art. It's literally the same thing.

1

Ronny_Jotten t1_j6hspu6 wrote

It's literally not the same thing though, at least legally speaking. It's already accepted that a human looking at an artwork is not "making a copy", as defined in the copyright laws. As long as they don't produce a "substantially similar" work, there's no copyright violation. The same can't be said for scanning or digitally copying a work into a computer; that is "making a copy" that's covered by the copyright laws. In some cases, that can come under the "fair use" exemption. But not in all cases. It's evaluated on a case-by-case basis; in the US according to the four-part fair use test. For example, if it's found that the generated works have a negative economic impact on the value of the original works, there's a substantial chance that it won't be found to be fair use.

4

CallFromMargin t1_j6hvui0 wrote

The computer is not storing a copy of original work in trained model. It looks at picture, it learns stuff from it and it stores only what it learns.

Your argument is based either on fundamental misconception on your part, or a flat out lie from you. Neither one casts you in good light

−3

Ronny_Jotten t1_j6hzcpu wrote

> The computer is not storing a copy of original work in trained model. It looks at picture, it learns stuff from it and it stores only what it learns.

Just because you anthropomorphize the computer as "looking at" and "learning stuff", doesn't mean it's not digitally copying and storing enough of the original work in a highly compressed form within the neural network to violate copyright by producing something "substantially similar": Image-generating AI can copy and paste from training data, raising IP concerns | TechCrunch

But regardless of whether it produces a "substantially similar" work as output, making a copy of the original copyrighted work into the computer in the first place is a required step in training the AI network. Doing so is only legally allowed if it's fair use. That was the question in the Google books case - it was found that the scanning of books was fair use, because Google didn't use it to create new books or otherwise economically damage the authors or the market for the original books. But that's not necessarily the case with all instances of making digital copies of copyrighted works.

> Your argument is based either on fundamental misconception on your part, or a flat out lie from you. Neither one casts you in good light

Well, you can fuck off with that, dude. There's no call for that kind of personal attack.

2

CallFromMargin t1_j6i4o2a wrote

No, the fact that it's mathematically impossible to store that many images, and if done, this compression algorithm would violate laws of physics, means that it is not storing images.

It is impossible to compress 380tb of data to 0.04tb of data.

−2

Ronny_Jotten t1_j6i68gn wrote

And yet, the citation I gave shows Stable Diffusion obviously replicating copyrighted images from the LAION training set, despite your musings about thermodynamics. It may not store reproducible representations of all images, I don't know - but it unquestionably does store some.

In any case, it doesn't change the fact that copying images into the computer in the first place, in order to train the model, would need to come under a fair use exemption. For example, research generally does - but not in every case, especially if it causes economic damage to the original authors. In many countries, authors also have moral rights, to attribution, to preservation of the integrity of their work against alteration that damages their reputation, etc., which may come into play.

2