Viewing a single comment thread. View all comments

ia3leonid t1_j7hgcoq wrote on February 6, 2023 at 8:46 PM

Gradients are also stored and take as much memory as weights + activations, or more for some optimisers (Adam also tracks statistics for each weight, for example )