MustachedSpud
MustachedSpud t1_j8t25bb wrote
Reply to comment by ChuckSeven in [D] Lion , An Optimizer That Outperforms Adam - Symbolic Discovery of Optimization Algorithms by ExponentialCookie
Not true, in any case with convolution, attention, or recurrence, which are most modern applications. In all of these cases the activation count grows with how often weights are reused as well as with batch size. Those dominate optimizer memory usage unless you used a tiny batch size.
That's why checkpointing can be useful. This paper does a solid job covering memory usage: https://scholar.google.com/scholar?q=low+memory+neural+network+training+checkpoint&hl=en&as_sdt=0&as_vis=1&oi=scholart#d=gs_qabs&t=1676575377350&u=%23p%3DOLSwmmdygaoJ
MustachedSpud t1_j8sacz8 wrote
Reply to comment by mfarahmand98 in [D] Lion , An Optimizer That Outperforms Adam - Symbolic Discovery of Optimization Algorithms by ExponentialCookie
They might be thinking in a different direction than me, but the majority of Memory use during training is not from the model weights or optimizer state in most cases. It comes from tracking all the activations of the training batch. If you think about a cnn, each filter gets used across the whole image so you will have many more activations than filters. So optimizer memory savings has very limited benefits
MustachedSpud t1_j8s9pid wrote
Reply to comment by bernhard-lehner in [D] Lion , An Optimizer That Outperforms Adam - Symbolic Discovery of Optimization Algorithms by ExponentialCookie
Wait that's so much better
MustachedSpud t1_j35gn03 wrote
Reply to comment by horselover_f4t in [Discussion] If ML is based on data generated by humans, can it truly outperform humans? by groman434
Are you trolling me or something? YOU are the person I responded to. YOU brought up the vanilla version, in a response to someone else who was talking about the zero version. The zero version is most relevant here because it learns from scratch, without human knowledge.
MustachedSpud t1_j34qzvu wrote
Reply to comment by horselover_f4t in [Discussion] If ML is based on data generated by humans, can it truly outperform humans? by groman434
The person two comments up was talking about the zero version. Thread is about how AI can surpass humans and the point is they already can if they have a way to improve without human data
MustachedSpud t1_j31glqv wrote
Reply to comment by horselover_f4t in [Discussion] If ML is based on data generated by humans, can it truly outperform humans? by groman434
The zero in alpha zero means it starts with no human knowledge. They figures out that this approach is eventually stronger than the base alpha go strategy.
MustachedSpud t1_iun145l wrote
Those sound sensible to me. Only other data that I can think of using would be engagement rate, like how many comments/likes per view.
MustachedSpud t1_j8t65fh wrote
Reply to comment by ChuckSeven in [D] Lion , An Optimizer That Outperforms Adam - Symbolic Discovery of Optimization Algorithms by ExponentialCookie
Yeah very configuration dependent, but larger batch sizes usually learn faster so there's a tendency to lean into that