Viewing a single comment thread. View all comments

murrdpirate t1_jben5uy wrote

>Notice that "significantly lower" can't actually be defined.

True. I guess I would say that over-fitting is a spectrum, and that there's generally some amount of over-fitting happening (unless your training set happens to be significantly more challenging than your test set). So the bigger the gap between train and test, the more over-fitting.

>It's tempting to think "test error is 3x train error, we're overfitting". This may or may not be right; there absolutely could be a (more complex) model B with, e.g., training error rate 0.05, test error rate 0.27.

Maybe it's semantics, but in my view, I would say model B is indeed overfitting "more" than model A. But I don't think more overfitting guarantees worse test results, it just increases the likelihood of worse test results due to increased variance. I may still choose to deploy model B, but I would view it as a highly overfitting model that happened to perform well.

Appreciate the response. I also liked your CrossValidated post. I've wondered about that issue myself. Do you think data augmentation should also be disabled in that test?

2

KD_A t1_jbf175s wrote

> Do you think data augmentation should also be disabled in that test?

Yes. I've never actually experimented w/ stuff like image augmentation. But in most examples I looked up, augmentation is a training-only computation which may make training loss look higher than it actually is. In general the rule is just this: to unbiasedly estimate training loss, apply the exact same code you're using to estimate validation loss to training data.

2