Submitted by **IamTimNguyen** t3_105v7el
in **MachineLearning**

#
**zhumao**
t1_j3evw1h wrote

took a quick glance (https://arxiv.org/abs/1910.12478 and https://proceedings.mlr.press/v139/yang21c.html), a few theorems but where r the proofs? also

>This includes applications to a rigorous proof for the existence of the Neural Network Gaussian Process and Neural Tangent Kernel for a general class of architectures, the existence of infinite-width feature learning limits, and the muP parameterization enabling hyperparameter transfer from smaller to larger networks.

it is well-known that training NN is a NP-complete, also means locally optimal solution r not globally optimal in general, hence stick a pre-train sub-net into a bigger one may or may not perform better than training larger NN from scratch, *proof* by application/implementation r demonstrations or one-shot experiment at best, not proof, speaking from a mathematics POV

#
**BossOfTheGame**
t1_j3f2goj wrote

Isn't training a neural network NP hard? What's the polynomial time verifier?

#
**binfin**
t1_j3f8n5x wrote

That the network gets some percentage accuracy on some testing set.

Editing for additional contextâ€¦ the poly time verifier is testing the network and verifying that it gets at least some % accuracy. This is similar to the poly time verifier for the traveling salesman problem, where you verify that a solution exists with some total travel distance. To find the shortest distance in a TSP instance, or the highest accuracy in a neural network you just need to do a binary search on your accuracy cutoff.

The complete poly-time solution for the NDTM is to test every bit combination of your parameters, each time checking that the model has some % accuracy. To find the highest % accuracy you just perform a binary search of possible accuracies (which of course grows logarithmically with the floating point precision of your accuracy). This whole process is poly time on an NDTM, and therefor the problem is contained in NP. You can further prove that finding optimal neural networks is NP complete by having the neural network be trained to solve 3-sat problems, where number of neural network parameters is a poly scale of the 3-sat N.

Because optimizing an NN (with respect to some testing set) is solvable in poly-time by an NDTM, and because you can reduce an NP-complete problem to optimizing a neural network, optimizing a neural network is NP-complete.

#
**BossOfTheGame**
t1_j3fhxe7 wrote

I'm convinced. Brilliantly explained. Constructing the problem with respect to a bound makes a ton of sense.

Now I suppose we just code it up and throw it at CPLEX.

Viewing a single comment thread. View all comments