Submitted by Blutorangensaft t3_11qejcz in MachineLearning
FrogBearSalamander t1_jc5vvrb wrote
Reply to comment by currentscurrents in [D]: Generalisation ability of autoencoders by Blutorangensaft
> Would love to read some research papers if you have a link!
- Nonlinear Transform Coding
- An Introduction to Neural Data Compression
- SoundStream: An End-to-End Neural Audio Codec
- Old but foundational: End-to-end Optimized Image Compression
- And this paper made the connection between compression models and VAEs: Variational image compression with a scale hyperprior
- Any VQ-based model (VQ-VAE, VQ-GAN, etc.) can be interpreted as a compression model. Many generative image models use VQ but they don't often present rate-distortion results. And, as /u/speyside42 said above, they typically assume a uniform distribution over the codebook, which isn't very interesting from a compression point of view. Instead, you want to learn a distribution and use it as an entropy model in conjunction with an entropy coder. Note that SoundStream (mentioned above) uses residual VQ (RVQ).
- Image Compression with Product Quantized Masked Image Modeling uses a kind of VQ (subdivide the latent vectors and code separate to form a product quantizer) along with masked image modeling (MIM) to get a conditional distribution over codewords. MIM is often used for generation but here they entropy code instead of sample.
Viewing a single comment thread. View all comments