Viewing a single comment thread. View all comments

Thijs-vW OP t1_iw6ryeq wrote

Thanks for the advice. Unfortunately I do not think transfer learning is the best thing for me to do, considering:

>if you train only on the new data, that's all it will know how to predict.

Anyhow,

>If retraining the entire model on the complete data set is possible with nominal cost in less than a few days, do that.

This is indeed the case. However, if I retrain my entire model, it is very likely that the new model will make entirely different predictions due to its weight matrix not being identical. This is the problem I would like to avoid. Do you have any advice on that?

1

ContributionWild5778 t1_iw98mo2 wrote

If you want to re-train the whole model with mixed dataset. The only option I can think of is transfer learning where you only initialise all the parameters in the same way which were used to train on the old dataset and re-train from 0th epoch

1

BugSlayerJohn t1_iwa32dc wrote

First of all, you don't want an identical or nearly identical weight matrix. You won't achieve that and you don't need to. In principle a well designed model should NOT make radically different predictions when retrained, particularly with the same data, even though the weight matrices will certainly differ at least a little and possibly a lot. The same model trained two different times on the same data with the same hyperparameters will generally converge to nearly identical behaviors, right down to which types of inputs the final model struggles with. If you have the original model, original data, and original hyperparameters, definitely don't be frightened to retrain a model.

If your use case requires you to be able to strongly reason about similarity of inference, you could filter your holdout set for the inputs that both models should accurately predict, run inference for that set against both models, and prepare a small report indicating the similarity of predictions. This should ordinarily be unnecessary, but since it sounds like achieving this similarity is a point of concern, this would allow you to measure it, if for no other purpose than to assuage fears. You should likely expect SOME drift in similarity, the different versions won't be identical, so if the similarity is not as high as you like consider manually reviewing a list of inputs that the two models gave different predictions for to confirm the rate at which the difference really is undesirable.

1