Viewing a single comment thread. View all comments

rikkajounin t1_j4umb8q wrote on January 18, 2023 at 11:04 AM

The following work shows that with sufficiently large width (overparameterized regime) you can have polynomial convergence to the global minimum which gets worse (but polynomially) with the depth of the network.

A Convergence Theory for Deep Learning via Over-Parameterization