Submitted by tsgiannis t3_10f5lnc in deeplearning
jsxgd t1_j4vruhl wrote
Are you saying in the “from scratch” implementation, you are only training using your own data? Or you are training the same architecture on the data used in pre-training + your own data?
tsgiannis OP t1_j4vvxt3 wrote
from scratch I mean I take the implementation of a model (just pick any) from articles and github pages, I copy paste it and I feed my data.
There is always a big accuracy difference no matter what...at first I thought it was my mistake because I always tinker what I copy but....
DrXaos t1_j4w9vav wrote
The data size of the pre trained model was likely enormously larger than yours and that overcomes the distribution shift.
Viewing a single comment thread. View all comments