Viewing a single comment thread. View all comments

VirtualHat t1_iyz96uh wrote

If you make your model large enough, you will get to 100%. In fact, not only can you get to 100% accuracy, but you can also get train loss to effectively 0. The paper I linked above discusses how this previously was considered a very bad idea, but if done carefully can actually improve generalization.

Probably the best bet though is to just stick to the "stop when validation goes up" rule.

1

Oceanboi t1_iz2bc3h wrote

Do you know if this is done by simply training for massive amounts of epochs and adding layers until you hit 100%?

I may still just be new, but I’ve never been able to achieve this in practice. I’d be really interested in practical advice on how to overfit your dataset. I still unsure of the results of that paper you linked, I feel like I am misinterpreting it in some way.

Is it really suggesting overparametrizing something past the overfitting point and continuing to train will ultimately yield a model that generalizes well?

I am using a data set of 8500 sounds, 10 classes. I cannot push past 70-75% accuracy and the more layers I add to the Convolutional base, the lower my accuracy becomes. Are they suggesting the layers be added to the classifier head only? I’m all for overparametrizing a model and leaving it on for days, I just don’t know how to be deliberate in this effort.

1

VirtualHat t1_iz2qj72 wrote

Yes, massive amounts of epochs with an overparameterized model. As mentioned, I wouldn't recommend it, though. It's just interesting that some of the intuition about how long to train for is changing from "too much is bad" to "too much is good".

If you are inserted in this subject, I'd highly recommend https://openai.com/blog/deep-double-descent/ (which is about overparameterization), as well as the paper mentioned above (which is about over-training). Again - I wouldn't recommend this for your problem. It's just interesting.

It's also worth remembering that there will be a natural error rate for your problem (i.e. does X actually tell us what y is). So it is possible that 70-75 test accuracy is the best you can do on your problem.

1

Oceanboi t1_iz2tn86 wrote

Is this natural error rate purely theoretical or is there some effort to quantify a ceiling?

If I’m understanding correctly, you’re saying there is always going to be some natural ceiling to accuracy for some problems in which the X data doesn’t hold enough information to perfectly predict Y, or in nature just doesn’t help us predict Y?

1

eigenlaplace t1_iz7dq71 wrote

there are problems where the target is not ideal, but it is noisy instead due to the rater being imperfect

so if you get 100% accuracy on test set, you might just be predicting wrong things because another, more experienced, rater would judge the ground truth to be different than what the first rater said

this is in fact true for most, if not all, data, except for toy/procedural datasets where you actually create the input-output pairs deterministically

1