AccountGotLocked69 t1_jcesw8m wrote on March 16, 2023 at 9:32 AM

Reply to comment by MrTacobeans in [D] Is there an expectation that epochs/learning rates should be kept the same between benchmark experiments? by TheWittyScreenName

I assume by hallucinate gaps you mean interpolate? In general it's the opposite, smaller simpler models are better at generalizing. Of course there are a million exceptions to this rule, but in the simple picture of using stable combinations of batch sizes and learning rates, big models will be more prone to overfit the data. Most of this rests on the assumption that the "ground truth" is always a simpler function than memorizing the entire dataset.