Submitted by CrazyCrab t3_zgvohh in MachineLearning
plocco-tocco t1_iziu4zy wrote
Reply to comment by CrazyCrab in [D] Did I overfit to val by choosing the best checkpoint? by CrazyCrab
Do 5 or 10 fold cross validation in this case. Often used when there is not a lot of data.
CrazyCrab OP t1_iziuhq4 wrote
Do you suggest doing cross validation with the training stopping mechanism "train for precisely the same number of steps I did in this run" or with "train using checkpointing and choosing the best checkpoint as I did in this run"?
plocco-tocco t1_izj4iy8 wrote
I would take the best checkpoints (aka when the validation loss starts diverging from the training loss). Not the same number of steps because it can happen that the networks don't converge to a minima at the same time, some may be stuck somewhere for longer.
Viewing a single comment thread. View all comments