Submitted by ExponentialCookie t3_1138jpp in MachineLearning
​
Seems interesting. A snippet from the Arxiv page:
>Our method discovers a simple and effective optimization algorithm, Lion (EvoLved Sign Momentum). It is more memory-efficient than Adam as it only keeps track of the momentum. Different from adaptive optimizers, its update has the same magnitude for each parameter calculated through the sign operation. We compare Lion with widely used optimizers, such as Adam and Adafactor, for training a variety of models on different tasks.
Links
Arxiv: https://arxiv.org/abs/2302.06675
Code Implementation: https://github.com/lucidrains/lion-pytorch
currentscurrents t1_j8op44d wrote
Does it though? There was a reproducibility survey recently that found that many optimizers claiming better performance did not in fact work for anything other than the tasks tested in their papers.
Essentially they were doing hyperparameter tuning - just the hyperparameter was the optimizer design itself.