TheNovicePhilomath

TheNovicePhilomath t1_j25wla1 wrote

I don't think this is a standard result, or at least I haven't encountered it. After some digging, this paper seems to have a good explanation of the similarities between Nesterov and PID (section 3).

Also, the idea behind the linked paper in the twitter thread just blew my mind. So obvious, yet beautiful. A Kalman filter as an optimiser to estimate network parameters from noisy loss measurements. Great stuff.

5