Viewing a single comment thread. View all comments

ia3leonid t1_j7hgcoq wrote

Gradients are also stored and take as much memory as weights + activations, or more for some optimisers (Adam also tracks statistics for each weight, for example )

1