Viewing a single comment thread. View all comments

sjd96 t1_j32lyay wrote

I'm gonna ignore OP's condescending tone for a moment and think that theoretically it might be possible to invert a given target image (i.e. find the input noise which generates that image) using an optimization process, by backpropping through the model. i.e., something like

target = load_tensor('mona_lisa.png')
prompt = clip_encode('a painting of a woman')
z = torch.randn(...).requires_grad_()
while not converged :
  z.grad = None
  pred = run_pretrained_latent_diffusion(prompt, z)
  loss = MSE(pred - target) # or whatever perceptual loss
  loss.backward()
  z = z - 0.01 * z.grad ## or use your favorite optimizer here
plt.imshow(z) ## recovered noise that will generate mona_lisa.png when prompted with `a painting of a woman`

What do others think?

2