CPOOCPOS OP t1_ivpmo8h wrote on November 9, 2022 at 6:13 PM

Reply to comment by bloc97 in [D] Is there an advantage in learning when taking the average Gradient compared to the Gradient of just one point by CPOOCPOS

>divergence

Hi bloc!! thanks for your answer

By taking the laplacian, you mean taking the laplacian ( Nabla * Nabla * f) of all points and average? Yes this is also possible. Not in a single Go, but i can get the second derivative of all points for each parameter and add them up. How would that help? Or what is a higher order optimisation

bloc97 t1_ivpper1 wrote on November 9, 2022 at 6:30 PM

I mean having the divergence would definitively help, as we will have additional information about the shape of the parameter landscape with respect to the loss function. The general idea would be to prefer areas with negative divergence, while trying to move and search through zero divergence areas very quickly.

Edit: In a sense, using the gradient alone only gives us information about the shape of the loss function at a single point, while having a laplacian gives us a larger "field of view" on the landscape.