Viewing a single comment thread. View all comments

fromnighttilldawn t1_ir549x1 wrote

But ADAM paper was wrong, so. It is no better than cooking up an equation, which I guess is impressive, but if you know the right people then the overall contribution is very low. Like ADAM was literally 1 or 2 steps away from whatever Hinton was doing, and Hinton was literally the co-author's (forgot his name) supervisor or something.

−2

badabummbadabing t1_ir9bv9x wrote

It did have an error in their convergence proof (which was later rectified by other people). But

  • this was only applicable to convex cost functions anyway (General convergence proof in this sense are impossible for general nonconvex problems like neural net training)
  • Adam is literally the most used optimiser for neural network training, it would be crazy to deny its significance due to a technical error in a proof in an irrelevant (for this application) regime

Regarding "whatever Hinton was doing": Are you talking about RMSprop? Sure, it's another momentum optimizer. There are many of them.

1