Viewing a single comment thread. View all comments

suflaj t1_ir4ow8t wrote on October 5, 2022 at 9:35 AM

Reply to comment by red_dragon in [D] How do you go about hyperparameter tuning when network takes a long time to train? by twocupv60

Switch to SGD after 1 epoch or so

But if they do worse than the baseline something else is likely the problem. Adam(W) does not kill performance, it just for some reason isn't as effective as reaching the best final performance as simpler optimizers.