__lawless t1_j4pzotl wrote on January 17, 2023 at 1:32 PM

A lot of folks here already mentioned online learning and the resources for it. However I am going to offer a very hacky solution inspired by idea of boosting. Suppose you had a regression model already trained. Make prediction for the new training batch and calculate the errors. Now train a new random forest model for the residual errors. For inference pass the features into the first model. For inference just pass the features to both models and sum the results.

monkeysingmonkeynew OP t1_j4r6lwj wrote on January 17, 2023 at 6:20 PM

this sounds pretty cool. but I don't follow every step. By "calculate the errors" do you mean for example, extract the predicted probabilities from the actual outcome?

Also, I didn't get your last part about inference, what exactly are you referring to there?

__lawless t1_j4r9ebs wrote on January 17, 2023 at 6:37 PM

Ok let me elaborate a bit. Imagine the old model is called m_0. Your newly obtained training data is X, y, features and labels, respectively. Now calculate the residual error which is the difference between y and prediction of m_0: dy = y - m_0(X). Now train a new model m_1. The labels and features are X, dy. Finally at inference time the prediction is the sum of the two models: y_pred = m_0(X_new) + m_1(X_new).

[deleted] t1_j4rjw0s wrote on January 17, 2023 at 7:41 PM

[deleted]

monkeysingmonkeynew OP t1_j4un2xm wrote on January 18, 2023 at 11:13 AM

OK I can almost see this working, thanks for the suggestion. The only thing that would prevent me from implementing this solution is that by taking the sum of the two models, it would let m_1 give as equal a contribution to the result as m_1. However I expect a single days data to be noisy, Thus I would need the contribution of the new days data to be down weighted somehow.