Comments

You must log in or register to comment.

Kalekuda t1_j32i776 wrote

"I would prefer if ONLY qualified researchers would answer my question rather than the common, disgusting, unwashed masses of the sub reddit that I posted my question to." /S

get a load of OP.

9

sjd96 t1_j32lyay wrote

I'm gonna ignore OP's condescending tone for a moment and think that theoretically it might be possible to invert a given target image (i.e. find the input noise which generates that image) using an optimization process, by backpropping through the model. i.e., something like

target = load_tensor('mona_lisa.png')
prompt = clip_encode('a painting of a woman')
z = torch.randn(...).requires_grad_()
while not converged :
  z.grad = None
  pred = run_pretrained_latent_diffusion(prompt, z)
  loss = MSE(pred - target) # or whatever perceptual loss
  loss.backward()
  z = z - 0.01 * z.grad ## or use your favorite optimizer here
plt.imshow(z) ## recovered noise that will generate mona_lisa.png when prompted with `a painting of a woman`

What do others think?

2

Agreeable-Run-9152 t1_j33wlnt wrote

Lets think about a dataset consisting of only one image x and that the optimization process is known and deterministic.

Then given the weights of the diffusion model, and the optimization procedure P(theta_0,t, x) which maps the initial weights theta_0 to theta_t after t steps trained on image x, this problem would be:

Find x of |Theta_t - P(theta_0,t,x) | = 0 for all times t.

I would IMAGINE (i am not sure) that for enough times t, we get a unique solution x.

This argument should even hold for datasets consisting of more images.

2

Agreeable-Run-9152 t1_j33xcmm wrote

Note that this argument really isnt about Diffusion or generative models but about optimization. I know my fair Share of generative modelling, but this Idea is a lot more general and might have been popped up somewhere else in optimization/inverse Problems?

1

thehodlingcompany t1_j357t3c wrote

If by "exactly recreate an image" you mean extract a binary-identical reconstruction of the original image from the model, then no. The size of the training data is many, many times larger than the model so if this were possible you would have devised the most amazing lossless compression algorithm known to humanity. So, signs point to no, although perhaps there are some contrived edge cases where it might be possible, such as a large model overfit to a small number of images. I'm not an ML researcher so maybe you should ignore this post but this is really more of an information theory question isn't it?

5

top1cent t1_j3wewt6 wrote

Yes it is actually possible get back the original data from the latent space. Check out autoencoders.

0