Viewing a single comment thread. View all comments

currentscurrents t1_j9gp4uq wrote

> From an information theory standpoint, it creates potential information loss due to the lower dimensionality.

Exactly! That's the point.

The bottleneck forces the network to throw away the parts of the data that don't contain much information. It learns to encode the data in an information-dense representation so that the decoder on the other side of the bottleneck can work with high-level ideas instead of pixel values.

If you manually tweak the values in the bottleneck, you'll notice it changes high-level ideas in the data like the gender or shape of a face, not pixel values. This is how autoencoders work; a unet is basically an autoencoder with skip connections.

Interestingly, biological neural networks that handle feedforward perception seem to do the same thing. Take a look at the structure of an insect antenna; thousands of input neurons bottleneck down to only 150 neurons, before expanding again for processing in the rest of the brain.

26

txhwind t1_j9n63wz wrote

One of keys to intelligence is learning to forget noncritical information. I think it might be a weak point of large language model.

1