Viewing a single comment thread. View all comments

_Arsenie_Boca_ t1_iyuu6le wrote

I usually prefer checkpointing over early stopping, i.e. you always save a checkpoint when you get a better validation score. Loss is typically a good indicator, but if you have a more specific measure that you are aiming for(like downstream metrics), you should use that.

11

CrazyCrab t1_izg3hw6 wrote

Recently, I have overfit to the validation dataset by doing this. The task is semantic segmentation. I trained for a very long time and I took the model with the best validation loss. Well, I got 0.02 nats/pixel cross entropy on val and 0.04 on train, 14% iou on val vs 24% on train.

1

_Arsenie_Boca_ t1_izg7d49 wrote

Not sure how this indicates overfitting on the validation set? Wouldnt this be indicated by much worse performance on test compared to validation set? Havent done a lot of image segmentation work, is this specific to the task?

1

CrazyCrab t1_izg96t3 wrote

I don't have a test set. It's not specific to a task.

1

_Arsenie_Boca_ t1_izgbbrj wrote

Then how can you tell if you overfitted on the validation set?

1

CrazyCrab t1_izgcu6i wrote

Ok, so my annotated data consists of about 50 images of size 10000x5000 pixels on average. The task is binary segmentation. Positives constitute approximately 8% of all pixels. 38 images are in the training part, 12 images are in the test part (I divided them randomly).

The batch cross entropy plot and the validation cross entropy plot were crazy unstable during training. After a little bit of training there mostly wasn't any stable trend in either going up or down. However, as the time went on, the best validation cross entropy over all checkpoints went down and went down...

So I think my checkpoint-selecting method gave me a model overfit to the validation dataset. I.e., I expect that on future samples the performance will be more like on the training dataset than on the validation dataset. The only other likely explanation I can think of is that I got unlucky and my validation dataset turned out to be significantly easier than my training dataset.

1