[D] Is there an advantage in learning when taking the average Gradient compared to the Gradient of just one point Submitted by CPOOCPOS t3_yql3wl on November 9, 2022 at 2:52 PM in MachineLearning 35 comments 38
entropyvsenergy t1_ivqltwc wrote on November 9, 2022 at 9:58 PM Batching does this, generally and it's a good thing for stability. Reduces the variance of the gradient update proportional to the batch size. Permalink 0
Viewing a single comment thread. View all comments