Some time ago I saw an article saying it is not preferred to use dropout and any kind of normalization(like batch or layer) in a model. But I am not sure why. Any suggestion about that?

Comments

You must log in or register to comment.

Pyrite_Pro t1_j2d64e4 wrote on December 31, 2022 at 10:47 AM

There is no definitive answer to this question. That’s why the field of machine learning has so much empirical experimenting. I suggest to try whether it improves performance or not.

Independent_Tax5335 t1_j2d4j4a wrote on December 31, 2022 at 10:24 AM

I think if you do batch Norm after dropout during training the parameters of batch norm are not correct at inference time. So I would do batch norm before dropout. On the other side it has been proven that batch norm also does some amount of regularization, so it is also fine to just use batch norm. I would choose the approach that works best for my specific use case

BrohammerOK t1_j2ekoyj wrote on December 31, 2022 at 6:05 PM

If you do use both in the same layer, dropout should never be applied right before batch or layer norm because the features set to 0 would affect the mean and variance calculations. As an example, it is common to use batch norm in CNNs, and then dropout after the global average pooling (before the final fc layer). Sometimes you even see dropout between conv blocks, take a look at EfficientNet by Google.

hannahmontana1814 t1_j2dwds2 wrote on December 31, 2022 at 3:17 PM

Yes, it makes sense to use dropout and layer normalization in the same model. But only if you want your model to be overfitting and perform worse than it could.