chaosmosis t1_j47d0ev wrote on January 13, 2023 at 5:56 PM

In addition to being more straightforward, applying the same total amount of compute to a single model doing end to end learning is often better for performance than splitting up compute between multiple models. As far as I'm aware, there aren't any systematic ways to tell when which method will be preferable, this is just a rule of thumb opinion.

jimmymvp t1_j4fcjly wrote on January 15, 2023 at 8:37 AM

Hm, I'm not sure about that. There's the mixture of experts idea that does not exactly stacking, but rather specializes multiple models to parts of the data so each data point gets assigned to a specific shallow model. What you need then is an assignment rule, mostly done by a classifier and it's been shown that this is cheaper in terms of compute at evaluation time. I'm not sure if the idea is abandoned by now, but Google Brain published a paper on this and there were subsequent works.

chaosmosis t1_j4fpg6g wrote on January 15, 2023 at 11:30 AM

I'd love the reference if you can find it.

jimmymvp t1_j4hy6xm wrote on January 15, 2023 at 9:19 PM

https://ai.googleblog.com/2022/11/mixture-of-experts-with-expert-choice.html?m=1

https://ai.googleblog.com/2022/01/scaling-vision-with-sparse-mixture-of.html?m=1

chaosmosis t1_j4lnsfe wrote on January 16, 2023 at 4:29 PM

Thanks!