Submitted by Blutorangensaft t3_11wmpoj in MachineLearning
YouAgainShmidhoobuh t1_jd2n2v5 wrote
ResNets do not tackle the vanishing gradient problem. The authors specifically mention that the issue of vanishing gradients was already fixed because of BatchNorm in particular. So removing BatchNorm from the equation will most likely lead to vanishing gradients.
I am assuming you are doing a WGAN approach since that would explain the gradient penalty violation. In this case, use LayerNorm as indicated here: https://github.com/LynnHo/DCGAN-LSGAN-WGAN-GP-DRAGAN-Tensorflow-2/issues/3
Blutorangensaft OP t1_jd7jaor wrote
Thank you for your comment. I have not worked with ResNets before, and the paper I used as a basis erroneously stated that they chose this architecture because of vanishing gradients. Wikipedia has the same error it seems.
Indeed, I am working with WGAN-GP. Unfortunately, implementing layer norm, while enabling me to scale the depth, completely changes the training dynamics. Training both G and C with the same learning rate and the same schedule (1:1), the critic seems to win, a behaviour I have never seen before in GANs. I suppose I will have to retune learning rates.
Viewing a single comment thread. View all comments