Submitted by **IamTimNguyen** t3_105v7el
in **MachineLearning**

Greg Yang is a mathematician and AI researcher at Microsoft Research who for the past several years has done incredibly original theoretical work in the understanding of large artificial neural networks. His work currently spans the following five papers:

Tensor Programs I: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes: https://arxiv.org/abs/1910.12478

Tensor Programs II: Neural Tangent Kernel for Any Architecture: https://arxiv.org/abs/2006.14548

Tensor Programs III: Neural Matrix Laws: https://arxiv.org/abs/2009.10685

Tensor Programs IV: Feature Learning in Infinite-Width Neural Networks: https://proceedings.mlr.press/v139/yang21c.html

Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer: https://arxiv.org/abs/2203.03466

In our whiteboard conversation, we get a sample of Greg's work, which goes under the name "Tensor Programs". The route chosen to compress Tensor Programs into the scope of a conversational video is to place its main concepts under the umbrella of one larger, central, and time-tested idea: that of taking a large N limit. This occurs most famously in the Law of Large Numbers and the Central Limit Theorem, which then play a fundamental role in the branch of mathematics known as Random Matrix Theory (RMT). We review this foundational material and then show how Tensor Programs (TP) generalizes this classical work, offering new proofs of RMT.

We conclude with the applications of Tensor Programs to a (rare!) rigorous theory of neural networks. This includes applications to a rigorous proof for the existence of the Neural Network Gaussian Process and Neural Tangent Kernel for a general class of architectures, the existence of infinite-width feature learning limits, and the muP parameterization enabling hyperparameter transfer from smaller to larger networks.

​

Youtube: https://youtu.be/1aXOXHA7Jcw

Apple Podcasts: https://podcasts.apple.com/us/podcast/the-cartesian-cafe/id1637353704

Spotify: https://open.spotify.com/show/1X5asAByNhNr996ZsGGICG

AlmightySnoot1_j3d6wpo wroteHaven't watched yet, but does he address criticism by e.g. The Principles of Deep Learning Theory, regarding the infinite width limit?