Submitted by IamTimNguyen t3_105v7el in MachineLearning
Greg Yang is a mathematician and AI researcher at Microsoft Research who for the past several years has done incredibly original theoretical work in the understanding of large artificial neural networks. His work currently spans the following five papers:
Tensor Programs I: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes: https://arxiv.org/abs/1910.12478
Tensor Programs II: Neural Tangent Kernel for Any Architecture: https://arxiv.org/abs/2006.14548
Tensor Programs III: Neural Matrix Laws: https://arxiv.org/abs/2009.10685
Tensor Programs IV: Feature Learning in Infinite-Width Neural Networks: https://proceedings.mlr.press/v139/yang21c.html
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer: https://arxiv.org/abs/2203.03466
In our whiteboard conversation, we get a sample of Greg's work, which goes under the name "Tensor Programs". The route chosen to compress Tensor Programs into the scope of a conversational video is to place its main concepts under the umbrella of one larger, central, and time-tested idea: that of taking a large N limit. This occurs most famously in the Law of Large Numbers and the Central Limit Theorem, which then play a fundamental role in the branch of mathematics known as Random Matrix Theory (RMT). We review this foundational material and then show how Tensor Programs (TP) generalizes this classical work, offering new proofs of RMT.
We conclude with the applications of Tensor Programs to a (rare!) rigorous theory of neural networks. This includes applications to a rigorous proof for the existence of the Neural Network Gaussian Process and Neural Tangent Kernel for a general class of architectures, the existence of infinite-width feature learning limits, and the muP parameterization enabling hyperparameter transfer from smaller to larger networks.
​
Youtube: https://youtu.be/1aXOXHA7Jcw
Apple Podcasts: https://podcasts.apple.com/us/podcast/the-cartesian-cafe/id1637353704
Spotify: https://open.spotify.com/show/1X5asAByNhNr996ZsGGICG
AlmightySnoo t1_j3d6wpo wrote
Haven't watched yet, but does he address criticism by e.g. The Principles of Deep Learning Theory, regarding the infinite width limit?