Submitted by BenXavier t3_10fbiz2 in MachineLearning

Hi everybody, I've been skimming this paper since yesterday and was once again impressed by the expressiveness and practicality of tree-based models.

I wondered what current research directions are in the field and what novel ideas have been presented in the last years - beyond improving performances. Examples may include better explainability, online learning, splitting criteria, enhanced or customizable loss functions, adding structure or constraints, shortcomings ....

124

Comments

You must log in or register to comment.

shellyturnwarm t1_j4w61tf wrote

Ha! i was about to link that exact paper you mention after seeing the title. Gael and his team are doing great work.

4

mickman_10 t1_j4w9jyx wrote

I know that Rich Caruana at Microsft has been pushing interpetable tree-based models for a little while now, and there’s still probably ongoing research there. For example, this paper and this project.

28

notdelet t1_j4wi2gy wrote

Well there is the stuff that Rudin is doing with Rashomon Sets/small explainable trees. Then there is the stuff on optimal decision trees using mixed-integer programs. I'm not working on the area at the moment, but those are the things I have heard people talk about recently.

24

edunuke t1_j4xg06j wrote

Skimmed through a book about "fast and frugal decision trees" found it interesting. I don't think it is something new per se, but I found the concept useful in terms of data efficiency and explainability.

1

SearchAtlantis t1_j4xjn4r wrote

Cynthia Rudin at Duke? Just want to clarify because when I see Rudin I think Walter Rudin ala Baby Rudin for Analysis.

Wow just looked her up. I know it wasn't practical for me to do a 2y MS at that point in my life but really wishing I'd gone to Duke now. Interpretable learning is one of my favorite things - and operations research was a passion in undergrad. Those INFORMS papers.

6

TheFlyingDrildo t1_j4xw5oi wrote

Susan Athey and Stefan Wager's push towards generalized random forests is a major step forward in opening up the type of estimation tasks random forests are useful for, while simultaneously providing the theory for large-sample inference.

An underlying perspective in their research (and most modern random forest theoretical research) is that random forests are effectively kernel regressors, with the forest construction adaptively and implicitly defining the kernel. The component that ends up influencing the adaptivity of the kernel the most is what defines how two child nodes are formed from a parent node.

In the way we implement things right now we've chosen a few techniques for computational ease: random subspacing (controlled by an mtry hyperparameter), axis-aligned splits, and standard CART splitting rules. I think there is still a lot of work to be done here. An example of an interesting direction with respect to splitting rules is the Distributional Random Forests paper.

Edit: In terms of other hyperparameters that people care about, I have a few comments. The depth of the forest should be controlled by a min_samples_leaf parameter, which controls the local vs global trade-off in the kernel. Should pretty much always be be selected in a problem specific manner with a hyperparameter search, but generally should be quite small. It's choice is closely related to the n_trees hyperparameter, which should always be as large as you can afford computationally. An interesting research direction however may be how to adaptively figure out what value of n_trees is "good enough" - which there has been some work on through the analysis of the Purely Uniform Random Forests model.

Lastly, bootstrapping or alternatively subsampling percentage. I believe random forests should always have the honesty property, which naturally pushes us towards subsampling for the extra flexibility in the percentage of data point in the leaves. There could be work done here to determine the appropriate percentage for the split, likely based on convergence rates in learning the tree vs estimating the leaves. Definitely a strong interaction here with the min_samples_leaf hyperparameter. However, the extra variability induced by bootstrapping (and using out-of-bag for honesty) may have desirable properties for the kernel learning, though I believe it is subsampling that makes the large-sample inference theory tractable within our current understanding. Another worthy area of research.

12

BenXavier OP t1_j4zd9jv wrote

Hey guys, thank you for the great responses!

  • "Accurate Intelligible Models with Pairwise Interactions" seems great - as far as I can understand. That's what I've been referring to with "adding structure to models". Crazily how thin is the reference section: lot of exciting work to do!
    • - Please do correct me if I'm being too naive, but are there other approaches for "building sub-models" at the splitting point?
  • u/mickman_10, u/TheFlyingDrildo, are you also aware of any connection with Symbolic Regression or Association Rule extraction?
2