Viewing a single comment thread. View all comments

_Arsenie_Boca_ t1_iqze3a1 wrote

As many others have mentioned, the decision boundaries from piecewise linear models are actually quite smooth in the end, given a sufficient amount of layers.

But to get to the core of your question, why would you prefer many stupid neurons over few smart ones. I believe there is a relatively simple explanation why the former is better. Having more complex neurons would mean that the computational complexity goes up while the number of parameters stays the same. I.e. with the same compute, you can train bigger (number of params) models if the neurons are simple. A high number of parameters is important for optimization as extra dimensions can be helpful in getting out of local minima. Not sure if this has been fully explained, but it is in part the reason why pruning works so well: we wouldnt need that many parameters to represent a good fit, but it is much easier to find in high dimensions, from where we can prune down to simpler models (only 5% of parameters with almost same performance).

1

_Arsenie_Boca_ t1_iqzeu3o wrote

There are a few papers researching this (effect of high dimensions for SGD), but I cant seem to find any right now. Maybe someone can help me out :)

1