Viewing a single comment thread. View all comments

entropyvsenergy t1_ivqltwc wrote

Batching does this, generally and it's a good thing for stability. Reduces the variance of the gradient update proportional to the batch size.

0