Viewing a single comment thread. View all comments

BrisklyBrusque t1_j0hx440 wrote

Yes, lots. For example, in 2019 a paper introduced a new split rule for categorical variables that reduces computational complexity.

A lot of researchers are also exploring adjacent tree ensembles such as extremely randomized trees (2006) and Bayesian additive regression trees (2008). The former is very similar to random forests. There is a strong possibility other tree ensembles have yet to be discovered!

If you’re a fan of computer science / optimized code, there is a great deal of research concerning making tree models faster. The ranger library in R was introduced as an improvement on the randomForest package. There is also interest in making random forests scale up to millions of variables, to deal with genetics data.

Hummingbird is a Microsoft project that seeks to refactor common machine learning methods using tensor algebra, so those methods can take advantage of GPUs. I don’t know if they got around to random forests yet.

Random forests raise a lot of questions about the relationship between ensemble diversity and ensemble accuracy, about which there are many mysteries.


chaosmosis t1_j0i51ka wrote

> Random forests raise a lot of questions about the relationship between ensemble diversity and ensemble accuracy, about which there are many mysteries.

By way of Jensen's inequality, there's a generalization of the bias-variance decomposition of mean-squared error that holds for all convex loss functions, see the paper Generalized Negative Correlation Learning that came out in 2021. From there, you can view linear averaging of model outputs as a special case of the method of control variates, where their diversity matters insofar as it's harnessed to reduce error due to variance. I think control variates give us a unified theoretical framework for investigating ensembles. They've got all sorts of fun generalizations like nonlinear control variates that are as yet completely unexplored in the machine learning literature.

In other words, you should diversify ensembles in exactly the same way as you should diversify a portfolio of financial investments according to optimal portfolio theory. See also Phillip Tetlock's work on his "extremizing algorithm" for an application of similar ideas to human forecasting competitions.

The main outstanding question with respect to ensembles, to my mind, is not how to make the most use of a collection of models, but when and whether to invest computational effort into running multiple models in parallel and optimizing the relationships between their errors rather than into training a bigger model.


BrisklyBrusque t1_j0i9ldc wrote

Thanks for the suggestion.


chaosmosis t1_j0ib3ja wrote

No problem at all. I'm leaving ML research for at least the next couple years, and I want my best ideas to get adopted by others. I figured out all of the above in a three month summer internship in 2020 and nobody there cared because it couldn't immediately be used to blow things up more effectively, which was incredibly disappointing.

As far as I can tell, nobody but me and this one footnote in an obscure economics paper I've forgotten the citation of has ever noted that ensembles and financial portfolios deal with the same problem if you cast both in terms of control variates. In theory, bridging between the two by way of control variates should allow for stealing lots and lots of ideas from finance literature for ML papers. Would really like seeing someone make something of the connection someday.


chaosmosis t1_j0icgvf wrote

As an example, imagine that Bob and Susan are estimating the height of a dinosaur and Bob makes errors that are exaggerated versions of Susan's, so if Susan underestimates its height by ten feet then Bob underestimates it by twenty, or if Susan overestimates its height by thirty feet then Bob overestimates it by forty. You can "artificially construct" a new prediction to average with Susan's predictions by taking the difference between her prediction and Bob's, flipping its sign, and adding it to her prediction. Then you conduct traditional linear averaging on the constructed prediction with Susan's prediction.

Visually, you can think about it as if normal averaging draws a straight line between two different models' individual outputs in R^n , then chooses some point between them, while control variates extend that line further in both directions and allow you to choose a point that's more extreme.

It's a little more complicated with more predictors and when issuing predictions in higher dimensions than in one dimension, but not by much. Intuitively, you have to avoid "overcounting" certain relationships when you're trying to build a flipped predictor. This is why the financial portfolio framework is helpful; they're already used to thinking about correlations between lots of different investments.

The tl;dr version is, you want models with errors that balance each other out.