Comments

You must log in or register to comment.

Kalekuda t1_j32i776 wrote

"I would prefer if ONLY qualified researchers would answer my question rather than the common, disgusting, unwashed masses of the sub reddit that I posted my question to." /S

get a load of OP.

9

[deleted] OP t1_j331jg2 wrote

[deleted]

−1

Kalekuda t1_j331r9z wrote

You'll need to attend a conference with other experts for that. The internet is for cat videos and conjecture.

5

thehodlingcompany t1_j357t3c wrote

If by "exactly recreate an image" you mean extract a binary-identical reconstruction of the original image from the model, then no. The size of the training data is many, many times larger than the model so if this were possible you would have devised the most amazing lossless compression algorithm known to humanity. So, signs point to no, although perhaps there are some contrived edge cases where it might be possible, such as a large model overfit to a small number of images. I'm not an ML researcher so maybe you should ignore this post but this is really more of an information theory question isn't it?

5

Optimal-Asshole t1_j3w5td4 wrote

I am a ML researcher, and you are right. You described it in a simpler/better way than I could.

2

Phoneaccount25732 t1_j35k8s8 wrote

Learned index functions are similar to compression algorithms and might be of interest here, but I think I agree with your argument anyway because they're very overparameterized.

1

sjd96 t1_j32lyay wrote

I'm gonna ignore OP's condescending tone for a moment and think that theoretically it might be possible to invert a given target image (i.e. find the input noise which generates that image) using an optimization process, by backpropping through the model. i.e., something like

target = load_tensor('mona_lisa.png')
prompt = clip_encode('a painting of a woman')
z = torch.randn(...).requires_grad_()
while not converged :
  z.grad = None
  pred = run_pretrained_latent_diffusion(prompt, z)
  loss = MSE(pred - target) # or whatever perceptual loss
  loss.backward()
  z = z - 0.01 * z.grad ## or use your favorite optimizer here
plt.imshow(z) ## recovered noise that will generate mona_lisa.png when prompted with `a painting of a woman`

What do others think?

2

Agreeable-Run-9152 t1_j33wlnt wrote

Lets think about a dataset consisting of only one image x and that the optimization process is known and deterministic.

Then given the weights of the diffusion model, and the optimization procedure P(theta_0,t, x) which maps the initial weights theta_0 to theta_t after t steps trained on image x, this problem would be:

Find x of |Theta_t - P(theta_0,t,x) | = 0 for all times t.

I would IMAGINE (i am not sure) that for enough times t, we get a unique solution x.

This argument should even hold for datasets consisting of more images.

2

Agreeable-Run-9152 t1_j33xcmm wrote

Note that this argument really isnt about Diffusion or generative models but about optimization. I know my fair Share of generative modelling, but this Idea is a lot more general and might have been popped up somewhere else in optimization/inverse Problems?

1

fakesoicansayshit t1_j3db14h wrote

If I train the model on a 1x1 pixel set of images that only have 2 states, black or white, and two labels, black or white, then shouldn't prompting 'black' generate a 1x1 black image 100% of the time?

1

Agreeable-Run-9152 t1_j3dbcyl wrote

Yeah thats true. My comment relates to unconditional diffusion Models a la Song and not stable Diffusion. The Argument might be adapted for conditional Generation.

2

top1cent t1_j3wewt6 wrote

Yes it is actually possible get back the original data from the latent space. Check out autoencoders.

0