nibbajenkem t1_j4wii8d wrote on January 18, 2023 at 7:19 PM

It's pretty simple. Deep neural networks are extremely underspecified by the data they train on https://arxiv.org/abs/2011.03395. Less data means more underspecification and thus the model more readily gets stuck in local minima. More data means you can more easily avoid certain local minima. So the question then boils down to the transferability of the learned features on different datasets. Imagenet pretraining generally works well because its a diverse and large scale dataset, which means models trained on it will by default avoid learning a lot of "silly" features.

tsgiannis OP t1_j4wk889 wrote on January 18, 2023 at 7:30 PM

>Less data means more underspecification and thus the model more readily gets stuck in local minima

Probably this is the answer to the my "why".

I_will_delete_myself t1_j4ylmkp wrote on January 19, 2023 at 3:58 AM

He just said why. It's because there isn't a diverse and large amount data you are training on. Imaginet was trained on many different kind of objects (over a million images) and while your toy dataset may probably only have 50-100k.

ContributionWild5778 t1_j5g6sio wrote on January 22, 2023 at 8:00 PM

This! I would just add that you can never find the exact reason as to why your training from scratch is giving less accuracy. Do you have enough data for all the neurons to learn the features ? Can you cross validate the validation loss of your dataset and pre-trained one ? Did you try removing/adding a dense layer to check how the performance is changed ?