Submitted by pm_me_your_pay_slips t3_10r57pn in MachineLearning
mongoosefist t1_j6ufv6a wrote
Is this really that surprising? Theoretically every image from clip should be in the latent space in a close-ish to original form. Obviously these guys went through a fair amount of trouble to recover these images, but it shouldn't surprise anyone that it's possible.
HateRedditCantQuitit t1_j6upt7k wrote
It's funny that the top comment right now is that it shouldn't be surprising, because whenever the legal argument comes in, the most common defense is that these models categorically don't memorize.
znihilist t1_j6uy7z0 wrote
I think people are using words and disagreeing on conclusions without agreeing first on what is exactly meant by those words.
I am not sure that everyone is using the word "memorize" the same. I think those who use it in the context of defense, are saying that those images are no where to be found in the model itself. It is just a function that takes words as an input and outputs a picture. Is the model memorizing the training data if it can recreate it? I don't know, but my initial intuition tells me there is a difference between memorizing and pattern recreation, even if they aren't easily distinguishable in this particular scenario.
znihilist t1_j6uz705 wrote
If you have a set of pair numbers: (1,1)..(2,3.95)..(3,9.05)..(4, 16.001)..etc These can be fitted with x^2, but x^2 does not contain anywhere the four pairs of numbers, but can recreate them to a certain degree of precision if you try to guess the x values.
Is f(x) = x^2 memorizing the inputs or just able to recreate them because they are in the possible outcome space?
Ronny_Jotten t1_j6wsav3 wrote
If I remember your face, does my brain contain your face? Can your face be found anywhere inside my brain? Or has my brain created a sort of close-fit formula, embodied in connections of neurons, that can reproduce it to a certain degree of precision? If the latter, does that mean that I haven't memorized your face, even though I can draw a pretty good picture of it?
visarga t1_j6x0qcm wrote
I think their argument goes like this - when you encode an image to JPEG the actual image is replaced by DCT coefficients and reconstruction is only approximate. That doesn't make the image free of copyright.
znihilist t1_j6xa0o3 wrote
My point is more to the fact that f(x) doesn't have 3.95 in it anywhere. Because another option would be to write f(x) as -(x-2)(x-3)(x-4)*1/6 -(x-1)(x-3)(x-4)*3.95/2 -(x-1)(x-2)(x-4)*9.05/2 + (x-1)(x-2)(x-3)*16.001/6 this recreates the original points, plug in 1 and you get -(-1)(-2)(-3)*1/6 -(0)(-2)(-3)*3.95/2 -(0)(-1)(-3)*9.05/2 + (0)(-1)(-2)*16.001/6 which is just 1.
This version of f(x) has "memorized" the inputs and is written as a direct function of these inputs, versus x^2 which has nothing in it that is retraced to the original inputs. Both of these functions are able to recreate the original inputs. Although one to infinite precision (RMSE = 0) and the other to an RMSE of ~0.035.
I think intuitively we recognize that these two functions are not the same even beyond their obvious differences (first is a 4th order power function, and the other is a 2nd order power function), either way. Point is, I think "memorize" while applicable in both cases, one stores a copy and the other is able to recreate from scratch, and I believe they do mean different things in their legal implications.
Also, I think it is very interesting the divide on this from a philosophical point of view, and with the genie being out of the bottle, then beside strong societal change and pressure that genie is never going back to the bottle.
Ronny_Jotten t1_j6wrlvv wrote
I think pretty much everyone would have to agree that the brain - the original neural network - can memorize and reproduce images, though never 100% exactly. That's literally what we mean by the word memorize: to create a representation of something in a biological neural network in a way that it can be recalled and reproduced.
Can those pictures be found somewhere inside the brain, can you open a skull and point to them? Or is it just a function of neuronal connections that outputs such a picture? Is there "a difference between memorizing and pattern recreation"? It sounds like a "how many angels can dance on the head of a pin" sort of question that's not worth spending a lot of time on.
I don't think anyone should be surprised that an artificial neural network can exhibit a similar kind of behaviour, and that for convenience we would call it by the same word: "memorizing". I'm not saying that every single image is memorized, any more than I have memorized every image I've ever seen. But I do remember some very well - especially if I've seen them many times.
Some say that AIs "learn" from the images they "see", but somehow they refuse to say that they "memorize" too. If they're going to make such anthropomorphic analogies, it seems a bit selective, if not hypocritical.
The extent to which something is memorized, or the differences in qualities and how it takes place in an artificial vs. organic neural network, is certainly something to be discussed. But if you want to argue that it's not truly memorizing, like the argument that ANNs don't have true intelligence, well, ok... but that's also a kind of "no true Scotsman" argument that's a bit meaningless.
visarga t1_j6x1uwy wrote
> The extent to which something is memorized ... is certainly something to be discussed.
One in a million chance of memorisation even when you're actively looking for them is not worth discussing about.
> We select the 350,000 most-duplicated examples from the training dataset and generate 500 candidate images for each of these prompts (totaling 175 million generated images). We find 109 images are near-copies of training examples.
On the other hand, these models compress billions of images into a few GB. There is less than 1 byte on average per input example, there's no space to have significant memorisation. Probably why there were only 109 memorised images found.
I would say I am impressed there were so few of them, if you use a blacklist for these images you can be 100% sure the model is not regurgitating training data verbatim.
I would suggest the model developers remove these images from the training set and replace them with variations generated with the previous model so they only learn the style and not the exact composition of the original. Replacing originals with variations - same style, different composition, would be a legitimate way to avoid close duplication.
SulszBachFramed t1_j6wa7ii wrote
You can make the same argument about lossy compression. Am I really infringing on copyright if I record an episode of House, re-encode it and redistribute it? It's not the 'original' episode, but a lossy copy of it. What if I compress it in a zip file and distribute that? In that case, I am only sharing something that can imperfectly recreate the original. The zip file itself does not resemble a video at all.
Ronny_Jotten t1_j6wndrm wrote
The test for copyright infringment is whether it's "substantially similar", not "exactly the same".
SulszBachFramed t1_j6wp97b wrote
Right, hence why its relevant to large models trained on huge datasets. If the model can reconstruct data such that it is substantially similar to the original, then we have a problem. Whether from the viewpoint of copyright infringement or privacy law (gdpr).
znihilist t1_j6xcp1i wrote
Good point, but the way I see it these two things look very similar but don't end up being similar in the way we thought or wanted. Compression takes one input and generates an output, the object (the file if you want) is only one thing, an episode of house. We'd argue that both versions are loosely identical, just differ in the underlying presentation (their 0's and 1's are different but they render the same object). Also, that object can't generate another episode of house (that aired a day early), or a none existing episode of house that he takes over the world, or where he's a Muppet. As the diffusion models don't have a copy, then the comparison falls on that particular aspect as none-applicable.
I do think, the infringement aspect is going to end up being by the user and not by the tool. Akin to how just because your TV can play pirated content, we assign the blame on the user and not the manufacturer of the TV. So it may end up being that creating these models is fine, but if you recreate something copyrighted, then that will be on you.
Either way, this is going to be one interesting supreme court decision (because I think it is definitely going there).
JigglyWiener t1_j6xy6ys wrote
Infringing content can be created with any number of tools and we don’t sue photoshop for not detecting someone trying to alter images of what is clearly Mickey Mouse. We sue the person when they are making money off of the sale of copyrighted material.
It’s not worth chasing copyright for Pennies
Ronny_Jotten t1_j6yenlh wrote
Adobe doesn't ship Photoshop with a button that produces an image of Mickey Mouse. They would be sued by Disney. The AI models do. They are not the same. It seems unlikely that Disney will find it "not worth chasing"; they spend millions defending their intellectual property.
JigglyWiener t1_j6yxwhz wrote
The models don’t come with buttons that do anything. They are tools capable only of what the software developers permit to enter the models and what users request.
If we go down the road of regulating training and capacity to do x, you’ll have to file lawsuits against every artist on behalf of every copyright holder over the IP inside the artist’s head.
These cases are going to fall apart and copyright holders are going to go after platforms that don’t put reasonable filters in place.
Ronny_Jotten t1_j6z9axn wrote
> The models don’t come with buttons that do anything. They are tools capable only of what the software developers permit to enter the models and what users request.
If you prompt an AI with "Mickey Mouse" - no more effort than clicking a button - you'll get an image of Mickey Mouse that violates intellectual property laws. The image, or the instructions for producing it, is contained inside the model, because many copyrighted images were digitally copied into the training system by the organization that created the model. It's just not remotely the same thing as someone using the paintbrush tool in Photoshop to draw a picture of Mickey Mouse themselves.
> If we go down the road of regulating training and capacity to do x, you’ll have to file lawsuits against every artist on behalf of every copyright holder over the IP inside the artist’s head.
I don't think you have a grasp of copyright law. That is a tired and debunked argument. Humans are allowed to look at things, and remember them. Humans are not allowed to make copies of things using a machine - including loading digital copies into a computer to train an AI model - unless it's covered by a fair use exemption. Humans are not the same as machines, in the law, or in reality.
> These cases are going to fall apart
I don't think they will. Especially for the image-generating AIs, it's going to be difficult to prove fair use in the training, if the output is used to compete economically with artists or image owners like Getty, whose works have been scanned in, and affect the market for that work. That's one of the four major requirements for fair use.
maxToTheJ t1_j6x4dc8 wrote
Thats a bad argument . MP3s are compressed versions for the original file for many songs so the original isn’t exactly in the MP3 until the decompression is applied. Would anybody argue that since a transformation is applied in the form of a decompression algo that Napster was actually in the clear legally
znihilist t1_j6x5c0y wrote
MP3 can recreate only the original version. They can't recreate other songs that has never been created or thought of. Compression only relates to one input and one output exactly. As such, this comparison falls apart when you apply it to these models.
maxToTheJ t1_j6yo1eq wrote
> They can't recreate other songs that has never been created or thought of.
AFAIK having a not copyrighting violating use doesnt excuse a copyright violating use.
znihilist t1_j6z78wg wrote
That's beside the point, my point is that the MP3 compression comparison doesn't work, so that line of reasoning isn't applicable. Whether one use can excuse another isn't part of the argument.
maxToTheJ t1_j6zy5z8 wrote
>That's beside the point,
It does for the comment thread which was about copyright
> my point is that the MP3 compression comparison doesn't work,
It does for the part that is actually the point (copyright law).
znihilist t1_j704b3j wrote
>> That's beside the point,
> It does for the comment thread which was about copyright
It doesn't, as this is issue has not been decided by courts or laws yet, and opinion seems to be evenly divided. So this is circular logic.
>> my point is that the MP3 compression comparison doesn't work,
> It does for the part that is actually the point (copyright law).
You mentioned MP3 (compressed versions) as comparable in functionality, and my argument is about how they are not similar in functionality, so the conclusion doesn't follow as they are not comparable in that analysis. Compression not absolving copyright infringement doesn't lead to the same thing being concluded for diffusion models. As you asserted that, you need to show show compression and diffusion follow the same functionality for that comparison to work. That's like if I say that it isn't illegal that I can look at a painting and then go home and have vivid images of that painting therefore diffusion models are not doing any infringement, that would be fallacious and wrong, functionality doesn't follow, the same for MP3 example.
maxToTheJ t1_j70et3o wrote
>You mentioned MP3 (compressed versions) as comparable in functionality,
Facepalm. For the identity part not the whole thing.
Wiskkey t1_j6v0hqg wrote
The fact that Stable Diffusion v1.x models memorize images is noted in the various v1.x model cards. For example, the following text is from the Stable Diffusion v1.5 model card:
>No additional measures were used to deduplicate the dataset. As a result, we observe some degree of memorization for images that are duplicated in the training data. The training data can be searched at https://rom1504.github.io/clip-retrieval/ to possibly assist in the detection of memorized images.
Argamanthys t1_j6w9gal wrote
There is a short story called The Library of Babel about a near-infinite library that contains every possible permutation of a book with 1,312,000 characters. It is not hard to recreate that library in code. You can explore it if you want.
Contained within that library is a copy of every book ever written, freely available to read.
Is that book piracy? It's right there if you know where to look.
That's pretty much what's going on here. They searched the latent space for an image and found it. But that's because the latent space, like the Library of Babel is really big and contains not just that image but also near-infinite permutations of it.
SuddenlyBANANAS t1_j6waypu wrote
If diffusion models were a perfect bijection between the latent space and the space of possible images, that would make sense, but they're obviously not. If you could repeat this procedure and find exact duplicates of images which were not in the training data, you'd have a point.
starstruckmon t1_j6xbhe1 wrote
>find exact duplicates of images which were not in the training data, you'd have a point
The process isn't exactly the same, but isn't this how all the diffusion based editing techniques work?
WikiSummarizerBot t1_j6w9h7w wrote
>"The Library of Babel" (Spanish: La biblioteca de Babel) is a short story by Argentine author and librarian Jorge Luis Borges (1899–1986), conceiving of a universe in the form of a vast library containing all possible 410-page books of a certain format and character set. The story was originally published in Spanish in Borges' 1941 collection of stories El jardín de senderos que se bifurcan (The Garden of Forking Paths). That entire book was, in turn, included within his much-reprinted Ficciones (1944).
^([ )^(F.A.Q)^( | )^(Opt Out)^( | )^(Opt Out Of Subreddit)^( | )^(GitHub)^( ] Downvote to remove | v1.5)
maxToTheJ t1_j6x4vrz wrote
> That's pretty much what's going on here.
No its not. We wouldn’t need training sets if that was the case like in the scenario described where you can generate the dataset using a known algo
Mescallan t1_j6wbz6f wrote
surmise-able information is not the same as memorization.
Laphing_Drunk t1_j6ur796 wrote
Yeah, model inversion attacks aren't new. It's reasonable to assume that large models, especially generative models that make no effort to be resilient, are susceptible to this.
maxToTheJ t1_j6vqzvb wrote
>Is this really that surprising?
It should be to all the people who claim these models are solely transformative in all the threads about the court cases related to generative model.
bushrod t1_j6vaal9 wrote
What theory are you referring to when you say "theoretically"?
mongoosefist t1_j6wed0f wrote
When the latent representation is trained, it should learn an accurate representation of the training set, but obviously with some noise because of the regularization that happens by learning the features along with some guassian noise in the latent space.
So by theoretically, I meant that due to the way the VAE is trained, on paper you could prove that you should be able to get an arbitrarily close representation of any training image if you can direct the denoising process in a very specific way. Which is exactly what these people did.
I will say there should be some hand waving involved however, because again even though it should be possible, if you have enough images that are similar enough in the latent space that there is significant overlap between their distributions, it's going to be intractably difficult to recover these 'memorized' images.
Viewing a single comment thread. View all comments