Kalekuda t1_j32i776 wrote on January 5, 2023 at 4:45 PM

#1,288,230

"I would prefer if ONLY qualified researchers would answer my question rather than the common, disgusting, unwashed masses of the sub reddit that I posted my question to." /S

get a load of OP.

sjd96 t1_j32lyay wrote on January 5, 2023 at 5:08 PM

#1,288,446

I'm gonna ignore OP's condescending tone for a moment and think that theoretically it might be possible to invert a given target image (i.e. find the input noise which generates that image) using an optimization process, by backpropping through the model. i.e., something like

target = load_tensor('mona_lisa.png')
prompt = clip_encode('a painting of a woman')
z = torch.randn(...).requires_grad_()
while not converged :
  z.grad = None
  pred = run_pretrained_latent_diffusion(prompt, z)
  loss = MSE(pred - target) # or whatever perceptual loss
  loss.backward()
  z = z - 0.01 * z.grad ## or use your favorite optimizer here
plt.imshow(z) ## recovered noise that will generate mona_lisa.png when prompted with `a painting of a woman`

What do others think?

[deleted] OP t1_j331jg2 wrote on January 5, 2023 at 6:40 PM

#1,289,272

Replying to Kalekuda (#1,288,230)

[deleted]

Kalekuda t1_j331r9z wrote on January 5, 2023 at 6:41 PM

#1,289,280

Replying to [deleted] (#1,289,272)

You'll need to attend a conference with other experts for that. The internet is for cat videos and conjecture.

Optimal-Asshole t1_j33nem3 wrote on January 5, 2023 at 8:49 PM

#1,290,322

No

Agreeable-Run-9152 t1_j33wlnt wrote on January 5, 2023 at 9:43 PM

#1,290,862

Lets think about a dataset consisting of only one image x and that the optimization process is known and deterministic.

Then given the weights of the diffusion model, and the optimization procedure P(theta_0,t, x) which maps the initial weights theta_0 to theta_t after t steps trained on image x, this problem would be:

Find x of |Theta_t - P(theta_0,t,x) | = 0 for all times t.

I would IMAGINE (i am not sure) that for enough times t, we get a unique solution x.

This argument should even hold for datasets consisting of more images.

Agreeable-Run-9152 t1_j33wpfm wrote on January 5, 2023 at 9:44 PM

#1,290,869

Replying to sjd96 (#1,288,446)

I thought it wasnt about the latent Code but the Training Set?

Agreeable-Run-9152 t1_j33xcmm wrote on January 5, 2023 at 9:48 PM

#1,290,906

Replying to Agreeable-Run-9152 (#1,290,862)

Note that this argument really isnt about Diffusion or generative models but about optimization. I know my fair Share of generative modelling, but this Idea is a lot more general and might have been popped up somewhere else in optimization/inverse Problems?

thehodlingcompany t1_j357t3c wrote on January 6, 2023 at 2:54 AM

#1,293,000

If by "exactly recreate an image" you mean extract a binary-identical reconstruction of the original image from the model, then no. The size of the training data is many, many times larger than the model so if this were possible you would have devised the most amazing lossless compression algorithm known to humanity. So, signs point to no, although perhaps there are some contrived edge cases where it might be possible, such as a large model overfit to a small number of images. I'm not an ML researcher so maybe you should ignore this post but this is really more of an information theory question isn't it?

Phoneaccount25732 t1_j35k8s8 wrote on January 6, 2023 at 4:30 AM

#1,293,519

Replying to thehodlingcompany (#1,293,000)

Learned index functions are similar to compression algorithms and might be of interest here, but I think I agree with your argument anyway because they're very overparameterized.

fakesoicansayshit t1_j3db14h wrote on January 7, 2023 at 7:23 PM

#1,303,827

Replying to Agreeable-Run-9152 (#1,290,862)

If I train the model on a 1x1 pixel set of images that only have 2 states, black or white, and two labels, black or white, then shouldn't prompting 'black' generate a 1x1 black image 100% of the time?

Agreeable-Run-9152 t1_j3dbcyl wrote on January 7, 2023 at 7:25 PM

#1,303,839

Replying to fakesoicansayshit (#1,303,827)

Yeah thats true. My comment relates to unconditional diffusion Models a la Song and not stable Diffusion. The Argument might be adapted for conditional Generation.

Optimal-Asshole t1_j3w5td4 wrote on January 11, 2023 at 2:10 PM

#1,332,426