The ResNet paper by Kaiming He et al. does not use dropout for the models. A lot of models prior to ResNets, such as AlexNet and VGGNet gained from using dropout.

Why did the authors choose not to use dropout for ResNets ? Is it because they use L2 regularization(weight decay) and batch normalization which are forms of regularization which can substitute dropout regularization ?

Comments

suflaj t1_j6fgjoj wrote on January 29, 2023 at 11:29 PM

Dropout is less effective in CNNs and Batch Normalization replaces it.

MinotaurOnLucy t1_j6fpaoj wrote on January 30, 2023 at 12:26 AM

Don’t they have two different purposes? As I understand it: The batchnorm is used to maintain activations along deep neural networks so that non linear activations do not kill the neurons whose probability distributions would have flattened out while a dropout is only meant to train a network uniformly to prevent overfitting.

suflaj t1_j6frxhe wrote on January 30, 2023 at 12:44 AM

They are both regularization techniques, so no, they have the same purpose.

XecutionStyle t1_j6ggq37 wrote on January 30, 2023 at 3:35 AM

BN is used to reduce covariate shift, it just happened to regularize. Dropout as a regularizing technique didn't become big before ResNet (2014 vs. 2015).

I doubt what you're saying is true, that they're effectively the same. Try putting one after the other to see the effect. Two drop-out layers or BN layers in contrast have no problem co-existing.

edit: sorry what I mean is the variants of drop-out that work with CNNs (that don't have detrimental effects) haven't existed then.

suflaj t1_j6hfkdj wrote on January 30, 2023 at 10:09 AM

> BN is used to reduce covariate shift, it just happened to regularize.

The first part was hypothesized, but not proven. It is a popular belief, like all other hypotheses why BN works so well.

> Dropout as a regularizing technique didn't become big before ResNet (2014 vs. 2015).

What does becoming big mean? Dropout was introduced in 2012 and used ever since. It was never big in the sense that you would always use it.

It is certainly false that Dropout was used because of ResNets or immediately after them for CNNs, as the first paper proving that there is benefit in using Dropout for convolutional layers was in 2017: https://link.springer.com/chapter/10.1007/978-3-319-54184-6_12

> I doubt what you're saying is true, that they're effectively the same.

I never said that.

Sorry-Resolution-334 t1_j6hqaom wrote on January 30, 2023 at 12:25 PM

廿4土44

Sorry-Resolution-334 t1_j6hqdft wrote on January 30, 2023 at 12:26 PM

gif