DreamyPen OP t1_ivty1ow wrote on November 10, 2022 at 4:17 PM

Reply to comment by Erosis in [Discussion] Can we train with multiple sources of data, some very reliable, others less so? by DreamyPen

Unfortunately not, I'm predicting material properties on the continuous scale.

Erosis t1_ivtyokc wrote on November 10, 2022 at 4:22 PM

You could use a custom training loop where you down-weight the gradients of the unreliable samples before you do parameter updates.

DreamyPen OP t1_ivu0v9o wrote on November 10, 2022 at 4:36 PM

Thank you for your comment. I am not sure what that custom loop would look like for an ensemble method (trees/gradient boosted), and how to proceed with down-weighing? Is it a documented technique I can read more about, or more of a workaound you are thinking of?

Erosis t1_ivu2gnv wrote on November 10, 2022 at 4:46 PM

Trees complicate it a bit more. I've never done it for something like that, but check this instance weight input to xgboost as an example. In the xgboost fit function, there is an input for sample_weight.

I know that tensorflow has a new-ish library for trees. You could manually write a gradient descent loop with modified minibatch gradients there, potentially.