MinotaurOnLucy t1_j6fpaoj wrote on January 30, 2023 at 12:26 AM

Reply to comment by suflaj in Why did the original ResNet paper not use dropout? by V1bicycle

Don’t they have two different purposes? As I understand it: The batchnorm is used to maintain activations along deep neural networks so that non linear activations do not kill the neurons whose probability distributions would have flattened out while a dropout is only meant to train a network uniformly to prevent overfitting.

suflaj t1_j6frxhe wrote on January 30, 2023 at 12:44 AM

They are both regularization techniques, so no, they have the same purpose.

XecutionStyle t1_j6ggq37 wrote on January 30, 2023 at 3:35 AM

BN is used to reduce covariate shift, it just happened to regularize. Dropout as a regularizing technique didn't become big before ResNet (2014 vs. 2015).

I doubt what you're saying is true, that they're effectively the same. Try putting one after the other to see the effect. Two drop-out layers or BN layers in contrast have no problem co-existing.

edit: sorry what I mean is the variants of drop-out that work with CNNs (that don't have detrimental effects) haven't existed then.

suflaj t1_j6hfkdj wrote on January 30, 2023 at 10:09 AM

> BN is used to reduce covariate shift, it just happened to regularize.

The first part was hypothesized, but not proven. It is a popular belief, like all other hypotheses why BN works so well.

> Dropout as a regularizing technique didn't become big before ResNet (2014 vs. 2015).

What does becoming big mean? Dropout was introduced in 2012 and used ever since. It was never big in the sense that you would always use it.

It is certainly false that Dropout was used because of ResNets or immediately after them for CNNs, as the first paper proving that there is benefit in using Dropout for convolutional layers was in 2017: https://link.springer.com/chapter/10.1007/978-3-319-54184-6_12

> I doubt what you're saying is true, that they're effectively the same.

I never said that.