Viewing a single comment thread. View all comments

zergling103 OP t1_isdxd7x wrote

If I understand correctly, Cold Diffusion, like the original diffusion network, assumes the perturbations are made in pixel space. That is, noise or other corruptions are added to or removed from individual RGB values for each pixel.

Latent diffusion models seem to perform better. They encode the image using a pretrained autoencoder, then the perturbations are added to or removed from the latent vectors. The network trains to take steps in a model's latent space instead of in pixel space.

However, the latent space of an autoencoder is a kind of information bottleneck, so you wouldn't be able to use it to encode real-world degradation perfectly, or make lossless tweaks to a given image you want to restore.

I wonder if the two concepts can be merged somehow? A lossless autoencoder?

4

Prinzessid t1_iseb58v wrote

I think you can train a denoising autoencoder without a bottleneck.

5