Viewing a single comment thread. View all comments

zdss t1_j8osnth wrote

I've just skimmed the paper, but this is a confusing result. I can see a simpler optimizer paying off when using similar amounts of computing due to being able to run more iterations, but they claim it's also better on a per-iteration basis across the entire learning task. There's not a lot going on in this algorithm, so where is the magic coming from?

It's kind of hard to believe that while people were experimenting with all these more complex optimizers no one tried something this simple and saw that it had state-of-the-art results.

10