Submitted by radi-cho t3_11izjc1 in MachineLearning
Delacroid t1_jb4c3xt wrote
Reply to comment by jobeta in [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho
I don't think so. If you look at the figure and check the angle between whole dataset backprop and minibatch backprop, increasing the learning rate wouldn't change that angle. Only the scale of the vectors.
Also, dropout does not (only) introduce noise, it prevents coadaptation of neurons. In the same way that in random forest each forest is trained on a subset on the data (bootstrapping I think it's called) the same happens for neurons when you use dropout.
I haven't read the paper but my intuition says thattthe merit of dropout for early stages of training could be that the bootstrapping is reducing the bias of the model. That's why the direction of optimization is closer to the whole dataset training.
Viewing a single comment thread. View all comments