Viewing a single comment thread. View all comments

AccountGotLocked69 t1_jcesw8m wrote

I assume by hallucinate gaps you mean interpolate? In general it's the opposite, smaller simpler models are better at generalizing. Of course there are a million exceptions to this rule, but in the simple picture of using stable combinations of batch sizes and learning rates, big models will be more prone to overfit the data. Most of this rests on the assumption that the "ground truth" is always a simpler function than memorizing the entire dataset.

2