Hi everybody, I've been skimming this paper since yesterday and was once again impressed by the expressiveness and practicality of tree-based models.

I wondered what current research directions are in the field and what novel ideas have been presented in the last years - beyond improving performances. Examples may include better explainability, online learning, splitting criteria, enhanced or customizable loss functions, adding structure or constraints, shortcomings ....

Comments

You must log in or register to comment.

shellyturnwarm t1_j4w61tf wrote on January 18, 2023 at 6:03 PM

#1,397,810

Ha! i was about to link that exact paper you mention after seeing the title. Gael and his team are doing great work.

mickman_10 t1_j4w9jyx wrote on January 18, 2023 at 6:24 PM

#1,398,024

I know that Rich Caruana at Microsft has been pushing interpetable tree-based models for a little while now, and there’s still probably ongoing research there. For example, this paper and this project.

notdelet t1_j4wi2gy wrote on January 18, 2023 at 7:16 PM

#1,398,585

Well there is the stuff that Rudin is doing with Rashomon Sets/small explainable trees. Then there is the stuff on optimal decision trees using mixed-integer programs. I'm not working on the area at the moment, but those are the things I have heard people talk about recently.

[deleted] t1_j4wqmn0 wrote on January 18, 2023 at 8:09 PM

#1,399,213

[deleted]

SnooHesitations8849 t1_j4wznru wrote on January 18, 2023 at 9:04 PM

#1,399,731

deep forest model. used in many financial applications in China.

dangerhexagon t1_j4x2yrp wrote on January 18, 2023 at 9:35 PM

#1,399,904

There's some papers on applying transformers to trees: https://arxiv.org/abs/1909.06639 , https://arxiv.org/abs/1911.09983 , https://papers.nips.cc/paper/2019/hash/6e0917469214d8fbd8c517dcdc6b8dcf-Abstract.html

And some recent work on tree extraction: https://arxiv.org/abs/2301.00447

There's also this paper which recovers a tree by observing the leaf nodes: https://arxiv.org/abs/2208.14924

leocus4 t1_j4x5krq wrote on January 18, 2023 at 9:51 PM

#1,400,071

I worked on Interpretable RL with trees (e.g., https://ieeexplore.ieee.org/document/10015004 ). If you want, I can send you more references to related work.

edunuke t1_j4xg06j wrote on January 18, 2023 at 10:59 PM

#1,400,713

Skimmed through a book about "fast and frugal decision trees" found it interesting. I don't think it is something new per se, but I found the concept useful in terms of data efficiency and explainability.

SearchAtlantis t1_j4xjn4r wrote on January 18, 2023 at 11:23 PM

#1,400,914

Replying to notdelet (#1,398,585)

Cynthia Rudin at Duke? Just want to clarify because when I see Rudin I think Walter Rudin ala Baby Rudin for Analysis.

Wow just looked her up. I know it wasn't practical for me to do a 2y MS at that point in my life but really wishing I'd gone to Duke now. Interpretable learning is one of my favorite things - and operations research was a passion in undergrad. Those INFORMS papers.

bitchslayer78 t1_j4xv6k1 wrote on January 19, 2023 at 12:43 AM

#1,401,573

Replying to SearchAtlantis (#1,400,914)

Shout out to ‘principles of mathematical analysis’, one of the best analysis books ever

TheFlyingDrildo t1_j4xw5oi wrote on January 19, 2023 at 12:50 AM

#1,401,622

Susan Athey and Stefan Wager's push towards generalized random forests is a major step forward in opening up the type of estimation tasks random forests are useful for, while simultaneously providing the theory for large-sample inference.

An underlying perspective in their research (and most modern random forest theoretical research) is that random forests are effectively kernel regressors, with the forest construction adaptively and implicitly defining the kernel. The component that ends up influencing the adaptivity of the kernel the most is what defines how two child nodes are formed from a parent node.

In the way we implement things right now we've chosen a few techniques for computational ease: random subspacing (controlled by an mtry hyperparameter), axis-aligned splits, and standard CART splitting rules. I think there is still a lot of work to be done here. An example of an interesting direction with respect to splitting rules is the Distributional Random Forests paper.

Edit: In terms of other hyperparameters that people care about, I have a few comments. The depth of the forest should be controlled by a min_samples_leaf parameter, which controls the local vs global trade-off in the kernel. Should pretty much always be be selected in a problem specific manner with a hyperparameter search, but generally should be quite small. It's choice is closely related to the n_trees hyperparameter, which should always be as large as you can afford computationally. An interesting research direction however may be how to adaptively figure out what value of n_trees is "good enough" - which there has been some work on through the analysis of the Purely Uniform Random Forests model.

Lastly, bootstrapping or alternatively subsampling percentage. I believe random forests should always have the honesty property, which naturally pushes us towards subsampling for the extra flexibility in the percentage of data point in the leaves. There could be work done here to determine the appropriate percentage for the split, likely based on convergence rates in learning the tree vs estimating the leaves. Definitely a strong interaction here with the min_samples_leaf hyperparameter. However, the extra variability induced by bootstrapping (and using out-of-bag for honesty) may have desirable properties for the kernel learning, though I believe it is subsampling that makes the large-sample inference theory tractable within our current understanding. Another worthy area of research.

SearchAtlantis t1_j4y02yt wrote on January 19, 2023 at 1:18 AM

#1,401,843

Replying to bitchslayer78 (#1,401,573)

Lol I never even took analysis and I still know Little/Baby Rudin by osmosis.

lightofaman t1_j4y2y0o wrote on January 19, 2023 at 1:39 AM

#1,401,995

The direction of the gradient

BenXavier OP t1_j4zd9jv wrote on January 19, 2023 at 8:55 AM

#1,404,370

Replying to mickman_10 (#1,398,024)

Hey guys, thank you for the great responses!

"Accurate Intelligible Models with Pairwise Interactions" seems great - as far as I can understand. That's what I've been referring to with "adding structure to models". Crazily how thin is the reference section: lot of exciting work to do!
- - Please do correct me if I'm being too naive, but are there other approaches for "building sub-models" at the splitting point?
u/mickman_10, u/TheFlyingDrildo, are you also aware of any connection with Symbolic Regression or Association Rule extraction?

BenXavier OP t1_j4zdd7b wrote on January 19, 2023 at 8:56 AM

#1,404,375

Replying to SnooHesitations8849 (#1,399,731)

can you elaborate a bit?

Are you talking about this? https://arxiv.org/abs/1702.08835

SnooHesitations8849 t1_j4zdxm7 wrote on January 19, 2023 at 9:04 AM

#1,404,408

Replying to BenXavier (#1,404,375)

Yep.

notdelet t1_j552geq wrote on January 20, 2023 at 1:30 PM

#1,415,998

Replying to SearchAtlantis (#1,400,914)

Haha yes, Cynthia Rudin.