Submitted by beautyofdeduction t3_10uuslf in deeplearning
beautyofdeduction OP t1_j7eqr8c wrote
Reply to comment by BellyDancerUrgot in Why does my Transformer blow GPU memory? by beautyofdeduction
8 Bytes * 22M = 0.176 GB?
BellyDancerUrgot t1_j7f0u7u wrote
Okay yeah Idk wtf I was typing. Yes 0.176gb for just the parameters. U still have to account for dense representations of long sequences, that too 8 times, activations, gradients and all these multiplied by the number of layers. There was a formula to approximate the value I read somewhere online. Activations I think take up way more memory than the model itself.
The memory requirement is roughly inline with most mid size transformer models I think.
beautyofdeduction OP t1_j7hkq74 wrote
That context of how much memory other models use up is helpful. Thanks for taking the time to respond.
Viewing a single comment thread. View all comments