Viewing a single comment thread. View all comments

speyside42 t1_jc44rbn wrote

> Vanilla autoencoders don't generalize well, but variational autoencoders have a much better structured latent space and generalize much better.

For toy problems yes, but not generally. For a generalizing Image Autoencoder, check for example ConvNextv2: https://arxiv.org/pdf/2301.00808.pdf

As a side note: The VQ-VAE from the blog post has actually really little to do with variational inference. You have basically no prior at all (uniform over all discrete latents) therefore the KL-divergence term can also be dropped. It's basically just a glorified quantized Autoencoder that could be interpreted in the language of variational models.

3