Submitted by cruddybanana1102 t3_zyclre in MachineLearning

Saw this tweet where it says that with some "quirky tricks" Nesterov can be obtained as a special case of PID control. I did a google search but it returned nothing of relevance.

Is this a popular result in optimisation I'm not aware of? Or have I just not looked hard enough? If someone can point me to relevant references, that'll be great.

21

Comments

You must log in or register to comment.

TheNovicePhilomath t1_j25wla1 wrote

I don't think this is a standard result, or at least I haven't encountered it. After some digging, this paper seems to have a good explanation of the similarities between Nesterov and PID (section 3).

Also, the idea behind the linked paper in the twitter thread just blew my mind. So obvious, yet beautiful. A Kalman filter as an optimiser to estimate network parameters from noisy loss measurements. Great stuff.

5

cruddybanana1102 OP t1_j262t6l wrote

Ikr! It blew my mind to see optimal control inspired designing of new optimizers! It shouldn't be surpising really but I can't not appreciate it. Also loveeeee the Kalman filter paper!!!!! And thanks for digging out that paper for me. Haven't gone through it fully yet, but it looks promising.

1

Red-Portal t1_j26qhza wrote

I don't see why one would have to go as far as a PID controller. The relationship between linear dynamical systems and momentum-based SGD algorithms is pretty straightforward. In fact, Lyapunov function-based analysis of SGD algorithms is pretty common.

2

bubudumbdumb t1_j28j898 wrote

TIL : Nesterov momentum is an extension of momentum that involves calculating the decaying moving average of the gradients of projected positions in the search space rather than the actual positions themselves.

I had a course on control theory and the ingredients of Nesterov momentum seem to be common building blocks of linear control systems: moving average and decay. PID control is the industrial application of linear control theory.

2