**hughperman**
Reply to comment by **bluuerp** in **[D] Is there an advantage in learning when taking the average Gradient compared to the Gradient of just one point** by **CPOOCPOS**

Consider though, in a linear scheme, taking each gradient step separately is equal the sum of the gradients. Taking the average is equal to the sum of the gradients divided by the number of steps. So you are only adjusting the step by a scale factor of 1/N, nothing more mathemagical.

