jsxgd t1_j4vruhl wrote on January 18, 2023 at 4:37 PM

Are you saying in the “from scratch” implementation, you are only training using your own data? Or you are training the same architecture on the data used in pre-training + your own data?

tsgiannis OP t1_j4vvxt3 wrote on January 18, 2023 at 5:02 PM

from scratch I mean I take the implementation of a model (just pick any) from articles and github pages, I copy paste it and I feed my data.

There is always a big accuracy difference no matter what...at first I thought it was my mistake because I always tinker what I copy but....

DrXaos t1_j4w9vav wrote on January 18, 2023 at 6:26 PM

The data size of the pre trained model was likely enormously larger than yours and that overcomes the distribution shift.