Submitted by shingekichan1996 t3_10ky2oh in MachineLearning
melgor89 t1_j5u6pdr wrote
Reply to comment by mgwizdala in [D] Self-Supervised Contrastive Approaches that don’t use large batch size. by shingekichan1996
As said in the topic, gradient accumulation is not a solution. However, gradient checkpointing could be. https://paperswithcode.com/method/gradient-checkpointing It recompute some of the features map during backwards pass so that they are not stored in memory. So you can fit bigger batch size
Viewing a single comment thread. View all comments