Submitted by TheWittyScreenName t3_11s1zfh in MachineLearning
I've found that by dramatically lowering the LR and increasing the number of epochs, very simple, baseline models can outperform SoTA models which use far more parameters. Is this considered "cheating" when comparing models? Is this something interesting enough to warrant a short paper? I'm not sure what to do with this information.
For example, in the original VGAE paper, when training a GAE, they use a LR of 0.01, and train for 200 epochs to get 0.91 AUC, 0.92 AP on a link prediction experiment. Rerunning the same experiment with a LR of 5e-5 for 1500 epochs gets 0.97 AUC, 0.97 AP which is better than the current leader on papers with code for this dataset.
It needs more epochs but has way, way fewer parameters than SoTA models, is this a valid trade-off? Is this even a fair comparison?
killver t1_jcbpq7c wrote
You actually rather found an issue in many research papers, that they do unfair comparisons on different methods based on un-tuned hyperparameters. If you run an EfficientNet vs. a VIT model on the same learning rate, you will get vastly different results.