Professional_Poet489 t1_j9gh652 wrote on February 21, 2023 at 7:48 PM

The theory is that bottlenecks are a compression / regularization mechanism. If you have a smaller number of parameters in the bottleneck than overall in the net, and you get high quality results from the output, then the bottleneck layer must be capturing the information required to drive the output to the correct results. The fact that these intermediate layers are often used for embeddings indicates that this is a real phenomenon.

_Arsenie_Boca_ OP t1_j9gix7q wrote on February 21, 2023 at 8:02 PM

If I understand you correctly, that would mean that bottlenecks only interesting when

a) you further use the lower dimensional features as output like in autoencoders b) you are interested in knowing if your features have lower intrinsic dimension

Both are not met in many cases such as normal ResNets. Could you elaborate how you believe bottlenecks act as regularizers?

Professional_Poet489 t1_j9gk545 wrote on February 21, 2023 at 8:11 PM

Re: regularization - by using fewer numbers to represent the same output info, you are implicitly reducing the dimensionality of your function approximate.

Re: (a), (b) Generally in big nets, you want to regularize because you will otherwise overfit. It’s not about the output dimension, it’s that you have a giant approximator (ie a billion params) fitting a much smaller data dimensionality and you have to do something about that. The output can be “cat or not” and you’ll still have the same problem.

currentscurrents t1_j9gvv4k wrote on February 21, 2023 at 10:01 PM

a) Lower-dimensional features are useful for most tasks, not just output and b) Real data almost always has a lower intrinsic dimension.

For example if you want to recognize faces, you'd have a much easier time recognizing patterns in things like gender, shape of facial features, hair color, etc rather than raw pixel data. Most pixel values are irrelevant.