Viewing a single comment thread. View all comments

themrzmaster t1_j3qi57k wrote

great post! Can someone give me a intuitive explanation on why diffusion models tends to put more weight on low spatial frequency? Is it because of the usual used noise schedule? (Cosine) In the text it is mentioned that likelihood objetive tends to weight more high spatial. It also points to an paper which involves tons of SDE, which I could not fully understand.

1

benanne OP t1_j3qzbcr wrote

If you were to graph the weighting that ensures the training loss corresponds to likelihood, you would find that it looks roughly like exp(-x). In other words, the importance of the noise levels decreases more or less exponentially (but not exactly!) as they increase. So if you want to train a diffusion model to maximise likelihood (which can be a valid thing to do, for example if you want to use it for lossless compression), your training set should have many more examples of low noise levels than of high noise levels (orders of magnitude more, in fact).

Usually when we train diffusion models, we sample noise levels uniformly, or from a simple distribution, but certainly not from a distribution which puts exponentially more weight on low noise levels. Therefore, relative to the likelihood loss, the loss we tend to use puts a lot less emphasis on low noise levels, which correspond to high spatial frequencies. Section 5 of my earlier blog post is an attempt at an intuitive explanation why this correspondence between noise levels and spatial frequencies exists: https://benanne.github.io/2022/01/31/diffusion.html#scale

"Variational diffusion models" is another paper that focuses on optimising likelihood, which you might find more accessible: https://arxiv.org/abs/2107.00630

3