Submitted by Infamous_Age_7731 t3_107pcux in deeplearning
I am training a DL model locally and on a VM from a private vendor. Locally I have an RTX 3080 Ti (12Gb), on the cloud for more memory I am using an Ampere A100 (80Gb).
I had the feeling that the VM GPU is a bit slow.
So then I used the exact same hyper-params (i.e. batch size etc) and I noticed again that the local RTX 3080Ti is much faster than the A100. When I checked it was 2-3x faster.
Is that because of the card? (for being a server GPU or I saw that the 80Gb is actually 2x40Gb connected with an NVLink, would that be it?). Or is it a standard practice for VM companies to throttle the GPU?
agentfuzzy999 t1_j3pbk38 wrote
I have trained locally and in the cloud on a variety of cards and server arch’s, depending on what model you are training it could be for a huge variety of reasons, but if you can fit the model on a 3080 you really aren’t going to be taking advantage of the A100s huge memory, the higher clock speed of the 3080 might simply suit this model and parameter set better.