I'm gonna ignore OP's condescending tone for a moment and think that theoretically it might be possible to invert a given target image (i.e. find the input noise which generates that image) using an optimization process, by backpropping through the model.
i.e., something like
target = load_tensor('mona_lisa.png')
prompt = clip_encode('a painting of a woman')
z = torch.randn(...).requires_grad_()
while not converged :
z.grad = None
pred = run_pretrained_latent_diffusion(prompt, z)
loss = MSE(pred - target) # or whatever perceptual loss
loss.backward()
z = z - 0.01 * z.grad ## or use your favorite optimizer here
plt.imshow(z) ## recovered noise that will generate mona_lisa.png when prompted with `a painting of a woman`
sjd96 t1_j32lyay wrote
Reply to [Discussion] Given the right seed (or input noise) and prompt, is it theoretically possible to exactly recreate an image that a latent diffusion model was trained on? by [deleted]
I'm gonna ignore OP's condescending tone for a moment and think that theoretically it might be possible to invert a given target image (i.e. find the input noise which generates that image) using an optimization process, by backpropping through the model. i.e., something like
What do others think?