zergling103

zergling103 OP t1_isdxd7x wrote

If I understand correctly, Cold Diffusion, like the original diffusion network, assumes the perturbations are made in pixel space. That is, noise or other corruptions are added to or removed from individual RGB values for each pixel.

Latent diffusion models seem to perform better. They encode the image using a pretrained autoencoder, then the perturbations are added to or removed from the latent vectors. The network trains to take steps in a model's latent space instead of in pixel space.

However, the latent space of an autoencoder is a kind of information bottleneck, so you wouldn't be able to use it to encode real-world degradation perfectly, or make lossless tweaks to a given image you want to restore.

I wonder if the two concepts can be merged somehow? A lossless autoencoder?

4

zergling103 OP t1_isczxzg wrote

After quickly skimming through the paper, it appears that they use multiple models, one model per type of image degredation. I was hoping to learn about a single general model that can reverse any squence of degredation. Perhaps it'd have better performance; for example the de-blurring cold diffusion model seems to produce outputs that lack detail.

20