Submitted by Blutorangensaft t3_11qejcz in MachineLearning
currentscurrents t1_jc5fxq3 wrote
Reply to comment by Red-Portal in [D]: Generalisation ability of autoencoders by Blutorangensaft
Wouldn't that make them great for the task they're actually learning to do: compression? You want to be able to compress and reconstruct any input data, even if less efficient for OOD data.
I do wonder why we don't use autoencoders for data compression. But this may simply be because neural networks require 1000x more compute power than traditional compressors.
Red-Portal t1_jc5g7ap wrote
Oh they have been used for compression. I also remember a paper on quantization, which made a buzz at the time.
currentscurrents t1_jc5ghbv wrote
Would love to read some research papers if you have a link!
But I mean that we do not use them for compression in practice. We use hand-crafted algorithms, mostly DEFLATE for lossless + a handful of lossy DCT-based algorithms for audio/video/images.
Red-Portal t1_jc5gmb0 wrote
Can't remember those on compression, but for quantization I was talking about this paper
FrogBearSalamander t1_jc5vvrb wrote
> Would love to read some research papers if you have a link!
- Nonlinear Transform Coding
- An Introduction to Neural Data Compression
- SoundStream: An End-to-End Neural Audio Codec
- Old but foundational: End-to-end Optimized Image Compression
- And this paper made the connection between compression models and VAEs: Variational image compression with a scale hyperprior
- Any VQ-based model (VQ-VAE, VQ-GAN, etc.) can be interpreted as a compression model. Many generative image models use VQ but they don't often present rate-distortion results. And, as /u/speyside42 said above, they typically assume a uniform distribution over the codebook, which isn't very interesting from a compression point of view. Instead, you want to learn a distribution and use it as an entropy model in conjunction with an entropy coder. Note that SoundStream (mentioned above) uses residual VQ (RVQ).
- Image Compression with Product Quantized Masked Image Modeling uses a kind of VQ (subdivide the latent vectors and code separate to form a product quantizer) along with masked image modeling (MIM) to get a conditional distribution over codewords. MIM is often used for generation but here they entropy code instead of sample.
Viewing a single comment thread. View all comments