Submitted by fedegarzar t3_z9vbw7 in MachineLearning
CyberPun-K t1_iyj4snj wrote
The M3 dataset consists only of 3,003 series, a minimal improvement of DL is not a surprise. Everybody knows that neural networks require large datasets to show substantial improvements over statistical baselines.
What is truly surprising is the time it takes to train the networks, 13 days for thousand series
=> there must be something broken with the experiments
HateRedditCantQuitit t1_iyj6yb6 wrote
14 days is 20k minutes, so it’s about 6.7 minutes per time series. I don’t know how many models are in the ensemble, but let’s assume it’s 13 models for even math, making an average deep model take 30s to train on an average time series.
Is that so crazy?
CyberPun-K t1_iyj7gb9 wrote
All the models are global models, trained using cross learning. Not single models per series. Unless the experiments were done like that.
I_LOVE_SOURCES t1_iykxl0a wrote
…. am i failing to detect humour/sarcasm? those words don’t appear to say anything
BrisklyBrusque t1_iyj6bja wrote
13 days to tune multiple deep neural networks is not at all unrealistic depending on the number of gpus.
CyberPun-K t1_iyj6q2r wrote
NBEATs hyper-parameters are minimally explored in the original paper the ensemble was not tuned. There is something broken with the reported times.
Historical_Ad2338 t1_iylgux6 wrote
I was thinking the same thing when I looked into this. I'm not sure if the experiments are necessarily 'broken' (there may be at least reasonable justification for why it took 13 days to train), but the first point about dataset size is a smoking gun.
__mantissa__ t1_iylhzuj wrote
I have not read the paper yet, but the time DL ensemble takes may be due to some kind of hyperparameter search
Viewing a single comment thread. View all comments