Viewing a single comment thread. View all comments

gunshoes t1_j5r241t wrote

Technically, and I emphasize the technically, the set of function represented by a neural network require only one layer. However, there is little guarantee that you can feasibly find the proper configuration or train the network accurately.

By adding another layer, you can reduce the training burden by spreading it across layers. The extra dropout also allows more regularization.

This is the part of deep learning where it's less science and more, "eh, sounds like it works."

1

arg_max t1_j5r8qe6 wrote

What do you mean by "function represented by a neural network"? If you are hinting in the direction of universal approximation, then yes, you can learn any continuous function arbitrarily close with a single layer, sigmoid activation and infinite width. But similarly, there exist some results that show you can achieve a similar statement with a width-limited and "infinite depth" network (the required depth is not infinite but depends on the function you want to approximate and is afaik unbounded over the space of continuous functions). In practice, we are far away from either infinite width or depth so specific configurations can matter.

1