CallFromMargin t1_j6gxzgp wrote on January 30, 2023 at 6:14 AM

Reply to comment by IAmDrNoLife in Microsoft, GitHub, and OpenAI ask court to throw out AI copyright lawsuit by Tooskee

The "they re-create art" argument comes from a paper that is widely shared on Reddit. Thing is, that paper itself mentions that the researchers trained their own models on small data sized, ranging from 300 pictures to few thousand, and they started seeing novel results at 1000 images.

Also current bots can't generate good code, not yet, but they have their own usage. As an example, a client I recently had asked me to design patching system (small shop, with 100 or so servers, they had no use for automated patching up to now), and some simple automation. You know, the type of weekend jobs you do to earn some extra cash. Well, since they are using azure, I went with azure automation, but I had no idea how it works. Well, chatGPT told me how it works, in details, gave me some code that might work, etc. But the most important thing by far was the high level overview, it saved me hours of reading documentation. This shit is the future, but not how you might expect it to be.

Ronny_Jotten t1_j6i3uog wrote on January 30, 2023 at 2:25 PM

I don't know what paper you're referring to, but there's this one:

Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models

It clearly shows, at the top of the first page, the full Stable Diffusion model, trained on billions of LAION images, replicating images that are clearly "substantially similar" copyright violations of its training data. The paper cites several other papers regarding the ability of large models to memorize their inputs.

It may be possible to tweak the generation algorithm to no longer output such similar images, but it's clear that they are still present in the trained model network.

Mr_ToDo t1_j6j481z wrote on January 30, 2023 at 6:23 PM

Well, they did both in that paper. But it would be interesting to know what the ones at the top were from. I know that there's one I saw further down in high hit percents further down but with as nice as they are I don't know why the rest don't if they belong to that model.

Ronny_Jotten t1_j6kjrlv wrote on January 30, 2023 at 11:50 PM

The paper explains what the ones at the top were from. It's using Stable Diffusion 1.4. See page 7: Case Study: Stable Diffusion, page 14: C. Stable Diffusion settings, and page 15 for the prompts and match captions. Sorry, the rest of your comment is incomprehensible to me...

Mr_ToDo t1_j6mwtay wrote on January 31, 2023 at 1:50 PM

OK that's on me. I hit the references and somehow thought I was done with the paper, I didn't think they would have the captions they used underneath that. I admit that was on my bad due diligence. Apologies