Submitted by begooboi t3_119zmpd in deeplearning
artsybashev t1_j9puq9o wrote
Reply to comment by levand in Why bigger transformer models are better learners? by begooboi
It is in a way the same phenomena. If you think about information in images, overfitting would start to learn even the noise patterns in the images. If your training data does not have enough real information to fill the model capacity, the model will start to learn noise and overfit to your data.
Viewing a single comment thread. View all comments