Submitted by TheCockatoo t3_10m1sdm in MachineLearning
IntelArtiGen t1_j60jjfg wrote
It's quite hard to answer these questions for neural networks. We don't really know if GANs are forever worse than Latent Diffusion Models, they are now, but previously they weren't, and perhaps in the future GANs will outperform LDMs. It seems that how we configure the denoising task now is better suited for text2img than how we configure GANs now.
A model usually outperforms another when it's more efficient in how it stores information in its weights. Successive conditioned denoising layers seem to be more efficient for this task, but it also requires a good enough perceptual loss, a good enough encoder, etc. We know that these networks could compete with GANs but maybe they were just not good enough before, or not combined in a good enough way.
Viewing a single comment thread. View all comments