Submitted by Thijs-vW t3_yta05n in deeplearning
scitech_boom t1_iw4ck9z wrote
>Concatenate old and new data and train one epoch.
This is what I did in the past and it worked reasonably well for my cases. But is that the best? I don't know.
Anyhow, you cannot do this:
>Simultaneously, I do want to use this model as starting point,
Instead pick the weights from 2 or 3 epochs before the best performing one in the previous training. That should be the starting point.
Training on the top of something that has already hit the bottom wont help, even if we add more data.
Thijs-vW OP t1_iw6rmvh wrote
>Anyhow, you cannot do this:
I do not understand why I cannot use train my already trained model on new data. Could you elaborate?
scitech_boom t1_iw6zpa9 wrote
There are multiple reasons. The main issue has to do with validation error. It usually follows a U curve, with a minimum at some epoch. This is the point at which we usually stop the training (`early stopping`). Any further training, with or without new data is only going to make the performance worse (I don't have a paper to cite for that).
I also started with the best model and that did not work. But when I took the model 2 epochs before the best model, it worked well. In my case(speech recognition), it was a nice balance between improvement and training time.
jobeta t1_iw6zxwa wrote
I don’t have much experience with that specific problem but I would tend to think it’s hard to generalize like this to “models that hit the bottom” without knowing what the validation loss actually looked like and what that new data looks like. Chances are, this data is not just perfectly sampled from the first dataset and the features have some idiosyncratic/new statistical properties. In that case, by feeding them in some way to your pre-trained model, the model loss is mechanically not in that minima it supposedly reached in the first training run anymore.
Viewing a single comment thread. View all comments