gunshoes t1_j5r241t wrote on January 24, 2023 at 11:32 PM

Technically, and I emphasize the technically, the set of function represented by a neural network require only one layer. However, there is little guarantee that you can feasibly find the proper configuration or train the network accurately.

By adding another layer, you can reduce the training burden by spreading it across layers. The extra dropout also allows more regularization.

This is the part of deep learning where it's less science and more, "eh, sounds like it works."

arg_max t1_j5r8qe6 wrote on January 25, 2023 at 12:18 AM

What do you mean by "function represented by a neural network"? If you are hinting in the direction of universal approximation, then yes, you can learn any continuous function arbitrarily close with a single layer, sigmoid activation and infinite width. But similarly, there exist some results that show you can achieve a similar statement with a width-limited and "infinite depth" network (the required depth is not infinite but depends on the function you want to approximate and is afaik unbounded over the space of continuous functions). In practice, we are far away from either infinite width or depth so specific configurations can matter.