Viewing a single comment thread. View all comments

TheNovicePhilomath t1_j25wla1 wrote

I don't think this is a standard result, or at least I haven't encountered it. After some digging, this paper seems to have a good explanation of the similarities between Nesterov and PID (section 3).

Also, the idea behind the linked paper in the twitter thread just blew my mind. So obvious, yet beautiful. A Kalman filter as an optimiser to estimate network parameters from noisy loss measurements. Great stuff.

5

cruddybanana1102 OP t1_j262t6l wrote

Ikr! It blew my mind to see optimal control inspired designing of new optimizers! It shouldn't be surpising really but I can't not appreciate it. Also loveeeee the Kalman filter paper!!!!! And thanks for digging out that paper for me. Haven't gone through it fully yet, but it looks promising.

1