Submitted by cruddybanana1102 t3_zyclre in MachineLearning
TheNovicePhilomath t1_j25wla1 wrote
I don't think this is a standard result, or at least I haven't encountered it. After some digging, this paper seems to have a good explanation of the similarities between Nesterov and PID (section 3).
Also, the idea behind the linked paper in the twitter thread just blew my mind. So obvious, yet beautiful. A Kalman filter as an optimiser to estimate network parameters from noisy loss measurements. Great stuff.
cruddybanana1102 OP t1_j262t6l wrote
Ikr! It blew my mind to see optimal control inspired designing of new optimizers! It shouldn't be surpising really but I can't not appreciate it. Also loveeeee the Kalman filter paper!!!!! And thanks for digging out that paper for me. Haven't gone through it fully yet, but it looks promising.
Viewing a single comment thread. View all comments