altmly t1_j5uglpx wrote on January 25, 2023 at 5:29 PM

Reply to comment by RaptorDotCpp in [D] Self-Supervised Contrastive Approaches that don’t use large batch size. by shingekichan1996

I'm confused. Gradient accumulation is exactly equivalent to batching as long as the data is the same, unless you use things like batch norm (you shouldn't).