Viewing a single comment thread. View all comments

arg_max t1_j60jz1r wrote

Iterative refinement seems to be a big part of it. In a GAN, your network has to produce one image in a single forward pass. In diffusion models, the model actually sees the intermediate steps over and over and can make gradual improvements. Also, if you think about what the noise does, in the first few steps it will remove all small details and only keep low frequent, large structures. Basically, in the first steps, the model kind of has to focus on overall composition. Then, as the noise level goes down, it can gradually start adding all the small details. On a more mathematical level, the noise smoothes the distribution and widens the support in the [0,1]^D cube (D=image dimension, like 256x256x3). Typically people assume that this manifold is low-dimensional which can make sampling from it hard.

Some support for this claim is that people were able to improve other generative models like autoregressive models using similar noisy distributions. Also, you can run GANs to sample from the intermediate distributions which works better than standard GANs.

9