Viewing a single comment thread. View all comments

chaosmosis t1_j47d0ev wrote

In addition to being more straightforward, applying the same total amount of compute to a single model doing end to end learning is often better for performance than splitting up compute between multiple models. As far as I'm aware, there aren't any systematic ways to tell when which method will be preferable, this is just a rule of thumb opinion.

8

jimmymvp t1_j4fcjly wrote

Hm, I'm not sure about that. There's the mixture of experts idea that does not exactly stacking, but rather specializes multiple models to parts of the data so each data point gets assigned to a specific shallow model. What you need then is an assignment rule, mostly done by a classifier and it's been shown that this is cheaper in terms of compute at evaluation time. I'm not sure if the idea is abandoned by now, but Google Brain published a paper on this and there were subsequent works.

2