Submitted by tsgiannis t3_10f5lnc in deeplearning
nibbajenkem t1_j4wii8d wrote
It's pretty simple. Deep neural networks are extremely underspecified by the data they train on https://arxiv.org/abs/2011.03395. Less data means more underspecification and thus the model more readily gets stuck in local minima. More data means you can more easily avoid certain local minima. So the question then boils down to the transferability of the learned features on different datasets. Imagenet pretraining generally works well because its a diverse and large scale dataset, which means models trained on it will by default avoid learning a lot of "silly" features.
tsgiannis OP t1_j4wk889 wrote
>Less data means more underspecification and thus the model more readily gets stuck in local minima
Probably this is the answer to the my "why".
I_will_delete_myself t1_j4ylmkp wrote
He just said why. It's because there isn't a diverse and large amount data you are training on. Imaginet was trained on many different kind of objects (over a million images) and while your toy dataset may probably only have 50-100k.
ContributionWild5778 t1_j5g6sio wrote
This! I would just add that you can never find the exact reason as to why your training from scratch is giving less accuracy. Do you have enough data for all the neurons to learn the features ? Can you cross validate the validation loss of your dataset and pre-trained one ? Did you try removing/adding a dense layer to check how the performance is changed ?
Viewing a single comment thread. View all comments