Paedor t1_j5ur6tx wrote on January 25, 2023 at 6:33 PM

Reply to comment by altmly in [D] Self-Supervised Contrastive Approaches that don’t use large batch size. by shingekichan1996

The trouble is that contrastive methods often compare elements from the same batch, instead of treating elements as independent like pretty much all other ML (except batchnorm).

As a simple example with a really weird version of contrastive learning: with a batch of 2N, contrastive learning might use the 4N^2 distances between batch elements to calculate a loss, while with two accumulated batches of N, contrastive learning could only use 2N^2 pairs for loss.