BrisklyBrusque t1_j0hx440 wrote on December 16, 2022 at 7:39 PM

Yes, lots. For example, in 2019 a paper introduced a new split rule for categorical variables that reduces computational complexity.

https://peerj.com/articles/6339/

A lot of researchers are also exploring adjacent tree ensembles such as extremely randomized trees (2006) and Bayesian additive regression trees (2008). The former is very similar to random forests. There is a strong possibility other tree ensembles have yet to be discovered!

If you’re a fan of computer science / optimized code, there is a great deal of research concerning making tree models faster. The ranger library in R was introduced as an improvement on the randomForest package. There is also interest in making random forests scale up to millions of variables, to deal with genetics data.

Hummingbird is a Microsoft project that seeks to refactor common machine learning methods using tensor algebra, so those methods can take advantage of GPUs. I don’t know if they got around to random forests yet.

Random forests raise a lot of questions about the relationship between ensemble diversity and ensemble accuracy, about which there are many mysteries.

chaosmosis t1_j0i51ka wrote on December 16, 2022 at 8:33 PM

> Random forests raise a lot of questions about the relationship between ensemble diversity and ensemble accuracy, about which there are many mysteries.

By way of Jensen's inequality, there's a generalization of the bias-variance decomposition of mean-squared error that holds for all convex loss functions, see the paper Generalized Negative Correlation Learning that came out in 2021. From there, you can view linear averaging of model outputs as a special case of the method of control variates, where their diversity matters insofar as it's harnessed to reduce error due to variance. I think control variates give us a unified theoretical framework for investigating ensembles. They've got all sorts of fun generalizations like nonlinear control variates that are as yet completely unexplored in the machine learning literature.

In other words, you should diversify ensembles in exactly the same way as you should diversify a portfolio of financial investments according to optimal portfolio theory. See also Phillip Tetlock's work on his "extremizing algorithm" for an application of similar ideas to human forecasting competitions.

The main outstanding question with respect to ensembles, to my mind, is not how to make the most use of a collection of models, but when and whether to invest computational effort into running multiple models in parallel and optimizing the relationships between their errors rather than into training a bigger model.

BrisklyBrusque t1_j0i9ldc wrote on December 16, 2022 at 9:04 PM

Thanks for the suggestion.

chaosmosis t1_j0ib3ja wrote on December 16, 2022 at 9:14 PM

No problem at all. I'm leaving ML research for at least the next couple years, and I want my best ideas to get adopted by others. I figured out all of the above in a three month summer internship in 2020 and nobody there cared because it couldn't immediately be used to blow things up more effectively, which was incredibly disappointing.

As far as I can tell, nobody but me and this one footnote in an obscure economics paper I've forgotten the citation of has ever noted that ensembles and financial portfolios deal with the same problem if you cast both in terms of control variates. In theory, bridging between the two by way of control variates should allow for stealing lots and lots of ideas from finance literature for ML papers. Would really like seeing someone make something of the connection someday.

chaosmosis t1_j0icgvf wrote on December 16, 2022 at 9:23 PM

As an example, imagine that Bob and Susan are estimating the height of a dinosaur and Bob makes errors that are exaggerated versions of Susan's, so if Susan underestimates its height by ten feet then Bob underestimates it by twenty, or if Susan overestimates its height by thirty feet then Bob overestimates it by forty. You can "artificially construct" a new prediction to average with Susan's predictions by taking the difference between her prediction and Bob's, flipping its sign, and adding it to her prediction. Then you conduct traditional linear averaging on the constructed prediction with Susan's prediction.

Visually, you can think about it as if normal averaging draws a straight line between two different models' individual outputs in R^n , then chooses some point between them, while control variates extend that line further in both directions and allow you to choose a point that's more extreme.

It's a little more complicated with more predictors and when issuing predictions in higher dimensions than in one dimension, but not by much. Intuitively, you have to avoid "overcounting" certain relationships when you're trying to build a flipped predictor. This is why the financial portfolio framework is helpful; they're already used to thinking about correlations between lots of different investments.

The tl;dr version is, you want models with errors that balance each other out.

[deleted] t1_j0itwb1 wrote on December 16, 2022 at 11:30 PM

[deleted]

curiousshortguy t1_j0gegtd wrote on December 16, 2022 at 1:28 PM

I think there's some interest in learning optimal decision trees in the community, as well as robust learning methods under different kinds of adversarial influence. They're less open problems and more areas of potential improvement though.

AdFew4357 t1_j0i4mo2 wrote on December 16, 2022 at 8:30 PM

BARTS (Bayesian additive regression trees). Also just in general ensemble learning is a useful approach to advance modeling in other areas with different kinds of data. For example, the area im reading about now which is time series classification has a ton of literature on time series models using ensemble learners under the hood. For example check out models like Arsenal, ProxmityForest, or other ensemble based methods for time series classification

WigglyHypersurface t1_j0igsnc wrote on December 16, 2022 at 9:54 PM

There is recent work on causal forests which also reinterprets forests as a kernel method. The same group also came up with local linear forests, which can help in cases where smoothness and/or extrapolation is desired.

https://arxiv.org/pdf/1510.04342 https://arxiv.org/pdf/1807.11408

zimonitrome t1_j136ic1 wrote on December 21, 2022 at 9:29 AM

iirc there is also very young research into binary trees to parallelize training on GPUs using CUDA. It could be a break through since ppl claim ANNs and random forests resemble one another.

[deleted] t1_j0fnk3v wrote on December 16, 2022 at 7:56 AM