Dropkickmurph512 t1_j9sxa1j wrote on February 24, 2023 at 9:20 AM

Reply to comment by suflaj in Why bigger transformer models are better learners? by begooboi

Agree about the overparamatized models but learning the noise definitely doesn't help. It's mostly from measurements error/quantization and other stuff that is not in the vector space of the signals you care about. It is why early stopping can be useful and actually acts as a regularizer. If you want to a good example look into denoising properties of deep image prior. It can remove noise by training on a single image and stop before learning the image completely.

Dropkickmurph512 t1_j9pnws1 wrote on February 23, 2023 at 6:03 PM

Reply to Why bigger transformer models are better learners? by begooboi

NKT theory kinda looks into this but for more general case. The math be wilden though. Real answer is that no one knows the real reason.