[deleted] t1_j5w9rbv wrote on January 26, 2023 at 12:21 AM

Reply to comment by koolaidman123 in [D] Self-Supervised Contrastive Approaches that don’t use large batch size. by shingekichan1996

[deleted]

koolaidman123 t1_j5wbk37 wrote on January 26, 2023 at 12:34 AM

Thats not the same thing...

Gradient accumulation calcs the loss on each batch, it doesnt work with in batch negatives because you need compare input from batch 1 to inputs of batch 2, hence offloading and caching predictions, then calculating the loss with 1 batch

Thats why gradient accumulation doesnt work to simulate large batch sizes for contrastive learning, if youre familiar with it