Submitted by Worth-Advance-1232 t3_10asgah in MachineLearning

Basically what the titel says. For me it seems that neither in business nor in literature Super Learners / Stacking is used frequently. Therefore I was wondering why this is the case? Especially since Stacking should guarantee at least equal performance as the base learners used for it. One reason that comes up my mind is the curse of data. As more levels in the architecture we have the more data splits are needed, reducing the available training data for each individual model, thus reducing the model performance. Another thing might be the complexity when building a Stacked Learner. Still that doesn’t see to be that bad of a trade-off. Anything I‘m totally missing here?

14

Comments

You must log in or register to comment.

chaosmosis t1_j47d0ev wrote

In addition to being more straightforward, applying the same total amount of compute to a single model doing end to end learning is often better for performance than splitting up compute between multiple models. As far as I'm aware, there aren't any systematic ways to tell when which method will be preferable, this is just a rule of thumb opinion.

8

jimmymvp t1_j4fcjly wrote

Hm, I'm not sure about that. There's the mixture of experts idea that does not exactly stacking, but rather specializes multiple models to parts of the data so each data point gets assigned to a specific shallow model. What you need then is an assignment rule, mostly done by a classifier and it's been shown that this is cheaper in terms of compute at evaluation time. I'm not sure if the idea is abandoned by now, but Google Brain published a paper on this and there were subsequent works.

2

ndemir t1_j466c3f wrote

It is just one of the tools that you end up using if you are using some kind of AutoML. I just confirmed that with h2o ;) https://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html

2

Worth-Advance-1232 OP t1_j47drjb wrote

For me this doesn’t really make all that much sense. Stacking is not a tool in itself, it’s rather a modelling approach. Also from what I can tell in the h2o docs it seems that their Stacked Ensemble has only one level to it and no meta-model. Thus rather than using e. g. the outputs (or probability distribution) of each model to train a new model, it will only use the output for any given input directly to return its final prediction, doesn’t it?

1

ndemir t1_j4820tc wrote

1

Worth-Advance-1232 OP t1_j48mdcw wrote

I’ll look into it, thanks a lot! The main topic/question is still open for me tho.

1

ndemir t1_j48mkuy wrote

In real world; stacking is used. It's usually behind the scenes -- as you have seen in this AutoML application.

2

Worth-Advance-1232 OP t1_j48mrfn wrote

I‘ve used it in production as well, but thought that there is quite few research published about it lately. So I kind of assumed it’s similar in practice.

1