Viewing a single comment thread. View all comments

bloc97 t1_ivpper1 wrote

I mean having the divergence would definitively help, as we will have additional information about the shape of the parameter landscape with respect to the loss function. The general idea would be to prefer areas with negative divergence, while trying to move and search through zero divergence areas very quickly.

Edit: In a sense, using the gradient alone only gives us information about the shape of the loss function at a single point, while having a laplacian gives us a larger "field of view" on the landscape.

1