Viewing a single comment thread. View all comments

jsxgd t1_j4vruhl wrote

Are you saying in the “from scratch” implementation, you are only training using your own data? Or you are training the same architecture on the data used in pre-training + your own data?

1

tsgiannis OP t1_j4vvxt3 wrote

from scratch I mean I take the implementation of a model (just pick any) from articles and github pages, I copy paste it and I feed my data.

There is always a big accuracy difference no matter what...at first I thought it was my mistake because I always tinker what I copy but....

1

DrXaos t1_j4w9vav wrote

The data size of the pre trained model was likely enormously larger than yours and that overcomes the distribution shift.

1