Submitted by twocupv60 t3_xvem36 in MachineLearning
My network takes about 24 hours to train. I have 2 hyperparameters to tune and assuming each parameter could take on roughly 6 orders of magnitude, then I would have to run my network 36 times to find the best hyperparameters given this grid search. This would take me over a month to perform! This seems quite long.
I see a lot of papers doing hyperparameter tuning. Do they have smaller networks that can train faster? Is some trick used to speed up the search process?
ButthurtFeminists t1_ir0zlg1 wrote
Im surprised this one hasnt been mentioned already.
Long training could be due to model complexity and/or dataset size. Therefore, you could use a subset of your dataset if it's difficult to downscale your model. For example, let's say I'm training a Resnet152 model on ImageNet - if I wanted to reduce training time for hyperparameters, I could sample a subset of Imagenet (maybe 1/10 the size) and tune hyperparams on that, then test the best hyperparameter on the full dataset.