Comments

You must log in or register to comment.

goj-145 t1_j80ufu1 wrote

We're going to find out soon with the Getty lawsuit. Until then, gray area.

38

sweatierorc t1_j854tn3 wrote

On the training part, it is probably legal, though you need to be careful about something like GDPR. E.g. for facial recognition, there are extra rules.

The "sharing model and/or its prediction" is the gray area.

Edit:t ypo

1

Tlaloc-Es OP t1_j80xdxu wrote

But anyway, is hard to demonstrate which is the dataset of a model right? in the case of Getty you can probably get images that look like Getty image dataset, but for a predictor? and if this case for example where "there wasn't any law" or predecessor case can lose the lawsuit having to pay?

−1

goj-145 t1_j80xlao wrote

Not really hard when the model is spitting out watermarked images.

9

Miguel33Angel t1_j830cig wrote

He's asking in the case of a predictor i.e. ResNet or other models that just categorizes

1

goj-145 t1_j831dqg wrote

The question is can you use copyrighted info to train a model. The answer is we don't know yet.

The current lawsuit that will define precedent on this is for image generation using copyrighted Getty images in a training model. It's proven that Getty images are used because the watermark shows up in the output of the model many times which is the answer to "how can they prove it".

Once that is defined, then we will know if it is legal or not in those jurisdictions. And then we will get to the "do we do it anyways even though it's illegal?"

3

2blazen t1_j8378vr wrote

So you're saying Stability wouldn't have issues if they hired an intern to git clone a watermark remover and put the images through it first?

1

goj-145 t1_j83801h wrote

It would have been MUCH harder to prove if they spent a day preprocessing the images first!

3

currentscurrents t1_j85rpol wrote

They use the open LAION 50B dataset, everybody knows what's in there.

Still, some preprocessing and deduplication would have been a good idea just for output quality.

2

Ulfgardleo t1_j84fdfl wrote

if it is illegal now it would be super illegal then, because removing watermarks on its own typically violates the license of the material.

​

The question is 100% the same as "can i include GPLv3 code in my commercial closed source repository if i remove the license headers and ensure that the code ris never published?"

0

DataGOGO t1_j813dui wrote

It is legal until a court says otherwise.

5

Tlaloc-Es OP t1_j81dot2 wrote

And could be any retroactive penalty?

1

DataGOGO t1_j81fm63 wrote

not likely, if found illegal, then you would have to "remove" the offending "images"

4

cajmorgans t1_j8416i1 wrote

Even if it will become illegal, the democracy of Machine Learning depends on it being legal. If Getty wins this, it would mean that a few pretty large companies would be the only ones that can build large models because they “own” most of the data. Facebook for example does a lot of stuff to prevent people scrape public data from their apps.

3

Ulfgardleo t1_j84fokp wrote

legally the data is not public and the fact that facebook is actively trying to prevent scraping is making it very difficult to argue otherwise.

Legally, the data cnanot be public. The users give facebook a non-exclusive license with limited rights to store and process the data. From this does not follow the right that anyone who sees the shared images (for example) has a right to process them as well. If that wasthe case, the terms (https://www.facebook.com/terms.php 3.1) would have to state under which license the works are redistributed by facebook.

2

[deleted] t1_j82rcdb wrote

[deleted]

1

Fragrant_Weakness547 t1_j82yp54 wrote

>That is the Million Dollar question (or really hundred million dollar question in terms of legal fees)

It's worth a lot more than that. The profit margins of AI focused companies are kind of on the line here.

1

a_user_to_ask t1_j88f7f6 wrote

The owner of the image are who have to decide the uses of their images. "All rights reserved" means that: the owner have rights for any use of images now and whatever someone invent in the future.

In an ideal world, each image of a dataset used in machine learning have to be identified with author and license. But I understand that is difficult to achieve because images are copied in the www and it is difficult locate the original source.

So, I have no doubt about the illegality of use images from web scrapping. Other thing is how easy is win/loss a lawsuit and to prove you used that data or not.

1

Tlaloc-Es OP t1_j89x7hi wrote

I think the same, but for example, If I scrape images from google with copyleft (that are wrong set), or without info, who is guilty?

1