Viewing a single comment thread. View all comments

Quaxi_ t1_is4woj1 wrote

I wouldn't say it is primarily because it is more stable though, it just gives better results and the properties of diffusion easily leads into other applications like in/outpainting and multimodality.

GANs are quite stable these days. Tricks like feature matching loss, spectral normalization, gradient clipping, TTUR etc makes modal collapse quite rare.

You're correct that it is quite slower at the moment though. The diffusion process needs to iterate per pass and thus takes longer both to train and to infer.

11

Atom_101 t1_is61h1l wrote

I doubt it's anywhere close to diffusion models though. Haven't worked with ttur and feature matching. But have tried spectral norm and wgan+gp. They can be unstable in weird ways. In fact, while wasserstein loss is definitely more stable, it massively slows down convergence compared to standard dcgan loss.

The biggan paper by Google tried to scale up GANs by throwing every known stabilization trick at them. They observed that even with these tricks you can't train beyond a point. BigGANs start degrading when trained too much. Granted it came out in 2018, but if this didn't hold true today we would have 100B parameter GANs already. I think the main advantage with DMs is that you can keep training them for an eternity without worrying about performance degradation.

3

Quaxi_ t1_is7gnsf wrote

No definitely - GANs can still fail and they are much less stable than Diffusion models. But GANs have still enjoyed a huge popularity despite that and research has found ways to mitigate it.

I just think it's not the main reason why diffusion models are gaining traction. If it was we probably would have seen a lot more of Variational Autoencoders. My work is not at BigGAN or DALLE2 scale though so might indeed miss some scaling aspect of this. :)

2

Atom_101 t1_is7ldte wrote

I think VAEs are weak not because of scaling issues but , because of an overly strong bias that the latent manifold has to be a Gaussian distribution with a diagonal covariance matrix. This problem is reduced using things like variational quantization. Dalle-1 actually used this, before DMs came to be. But even then, I believe they are too underpowered. Another technique of image generation is normalising flows which also require heavy restrictions on model architecture. GANs and DMs are much more unrestricted and can model arbitrary data distributions.

Can you point to an example where you see GANs perform visibly worse? Although we can't really compare quality between sota GANs and sota DMs. The difference in scale is just too huge. There was a tweet thread recently, regarding Google imagen iirc, which showed that increasing model size drastically improves image quality for text-to-image DMs. Going from 1B to 10B params showed visible improvements. But if you compare photorealistic faces generated by stable diffusion and say stylegan3, I am not sure you would be able to see differences.

2