>This includes applications to a rigorous proof for the existence of the Neural Network Gaussian Process and Neural Tangent Kernel for a general class of architectures, the existence of infinite-width feature learning limits, and the muP parameterization enabling hyperparameter transfer from smaller to larger networks.

it is well-known that training NN is a NP-complete, also means locally optimal solution r not globally optimal in general, hence stick a pre-train sub-net into a bigger one may or may not perform better than training larger NN from scratch, proof by application/implementation r demonstrations or one-shot experiment at best, not proof, speaking from a mathematics POV

zhumaot1_j3evw1h wroteReply to

[R] Greg Yang's work on a rigorous mathematical theory for neural networksbyIamTimNguyentook a quick glance (https://arxiv.org/abs/1910.12478 and https://proceedings.mlr.press/v139/yang21c.html), a few theorems but where r the proofs? also

>This includes applications to a rigorous proof for the existence of the Neural Network Gaussian Process and Neural Tangent Kernel for a general class of architectures, the existence of infinite-width feature learning limits, and the muP parameterization enabling hyperparameter transfer from smaller to larger networks.

it is well-known that training NN is a NP-complete, also means locally optimal solution r not globally optimal in general, hence stick a pre-train sub-net into a bigger one may or may not perform better than training larger NN from scratch,

proofby application/implementation r demonstrations or one-shot experiment at best, not proof, speaking from a mathematics POV