Submitted by Worth-Advance-1232 t3_10asgah in MachineLearning

Basically what the titel says. For me it seems that neither in business nor in literature Super Learners / Stacking is used frequently. Therefore I was wondering why this is the case? Especially since Stacking should guarantee at least equal performance as the base learners used for it. One reason that comes up my mind is the curse of data. As more levels in the architecture we have the more data splits are needed, reducing the available training data for each individual model, thus reducing the model performance. Another thing might be the complexity when building a Stacked Learner. Still that doesn’t see to be that bad of a trade-off. Anything I‘m totally missing here?

14

Comments

You must log in or register to comment.

chaosmosis t1_j47d0ev wrote

In addition to being more straightforward, applying the same total amount of compute to a single model doing end to end learning is often better for performance than splitting up compute between multiple models. As far as I'm aware, there aren't any systematic ways to tell when which method will be preferable, this is just a rule of thumb opinion.

8

Worth-Advance-1232 OP t1_j47drjb wrote

For me this doesn’t really make all that much sense. Stacking is not a tool in itself, it’s rather a modelling approach. Also from what I can tell in the h2o docs it seems that their Stacked Ensemble has only one level to it and no meta-model. Thus rather than using e. g. the outputs (or probability distribution) of each model to train a new model, it will only use the output for any given input directly to return its final prediction, doesn’t it?

1

jonas__m t1_j47r617 wrote

this is one of the many strategies used in autogluon that enables it to outperform other autoML tools on most datasets:
https://arxiv.org/abs/2003.06505

https://arxiv.org/abs/2207.12560

One complaint people raise is regarding latency & complexity of deploying ensemble models, but there are many easy options to deal with this:
https://auto.gluon.ai/stable/tutorials/tabular_prediction/tabular-indepth.html#accelerating-inference

7

jimmymvp t1_j4fcjly wrote

Hm, I'm not sure about that. There's the mixture of experts idea that does not exactly stacking, but rather specializes multiple models to parts of the data so each data point gets assigned to a specific shallow model. What you need then is an assignment rule, mostly done by a classifier and it's been shown that this is cheaper in terms of compute at evaluation time. I'm not sure if the idea is abandoned by now, but Google Brain published a paper on this and there were subsequent works.

2