Viewing a single comment thread. View all comments

Internal-Diet-514 t1_iykhg3s wrote

I think so too, I’m confused why they would need to train for 14 days, from skimming the paper it doesn’t seem like the dataset itself is that large. I bet a DL solution that was parameterized correctly to the problem would outperform the traditional statistical approaches.

19

marr75 t1_iykwulm wrote

While I agree with your general statement, my gut says a well parameterized/regularized deep learning solution would perform as well as an ensemble of statistical approaches (without the expertise needed to select the statistical approaches) but would be harder to explain/interpret.

15

TheDrownedKraken t1_iyko6jf wrote

I’m just curious, why do you think that?

3

Internal-Diet-514 t1_iymjci2 wrote

If a model has more parameters than datapoints in the training set it can quickly just learn the training set resulting in an over-fit model. You don’t always need 16+ attention heads to have the best model for a given dataset. A single self attention layer with one head still has the ability to model more complex relationships among the inputs than something like arima would.

2

kraegarthegreat t1_iyor5g6 wrote

This is something I have found in my research. I keep seeing people making models with millions of parameters when I am able to achieve 99% of the performance with roughly 1k.

2