Submitted by _Redone t3_110ldrx in MachineLearning
The-Last-Lion-Turtle t1_j89jm9s wrote
The purpose of a deep network is to approximate complex non linear functions. With relu the network is piecewise linear. Imagine slicing a space with many planes, locally it's flat, but zooming out it has a very complex shape, similar to getting a 3D model out of triangles. Each layer adds an additional linear deformation and a slice to the space.
Read the resnent paper. It's a great explanation for both why depth matters for performance and how it causes issues for training. The solution of residual connections is central to every deep learning architecture after this paper.
Viewing a single comment thread. View all comments