Submitted by shingekichan1996 t3_10ky2oh in MachineLearning
[deleted] t1_j5w9rbv wrote
Reply to comment by koolaidman123 in [D] Self-Supervised Contrastive Approaches that don’t use large batch size. by shingekichan1996
[deleted]
koolaidman123 t1_j5wbk37 wrote
Thats not the same thing...
Gradient accumulation calcs the loss on each batch, it doesnt work with in batch negatives because you need compare input from batch 1 to inputs of batch 2, hence offloading and caching predictions, then calculating the loss with 1 batch
Thats why gradient accumulation doesnt work to simulate large batch sizes for contrastive learning, if youre familiar with it
Viewing a single comment thread. View all comments