Viewing a single comment thread. View all comments

BenoitParis t1_j4qbih9 wrote

Hoeffding Trees come to mind. The keyword you are looking for is 'online learning'. Apparently there's a python package dedicated to that:

https://scikit-multiflow.readthedocs.io/en/stable/api/api.html

But 250000 rows is not that high. Since your time requirements are daily I'd consider looking for other algorithms or implementations in other languages before that.

6

Repulsive_Tart3669 t1_j4qqivs wrote

This should be considered in the first place. For instance, gradient boosting trees that are mostly implemented in C/C++ and have GPU compute backends - XGBoost, CatBoost and LightGBM. Given daily updates, you'll have enough time not only to train a model, but also optimize its hyperparameters. In my experience, XGBoost + RayTune work just fine.

2

monkeysingmonkeynew OP t1_j4r539v wrote

Yes, it's ok if i run it once a day, but I need to backtest two years of data and so it's not feasible on a laptop, or affordable on a GPU

2