qiltb
qiltb t1_j3rtytt wrote
Reply to comment by Infamous_Age_7731 in Cloud VM GPU is much slower than my local GPU by Infamous_Age_7731
that doesn't sound weird to me though, servers use much slower ecc ram probably....
qiltb t1_j3rtr6a wrote
Be sure to check logs (i.e. dmesg for starters). Many A100s on AWS for example suffer from memory corruptions which leads to severe degradation in performance. Also check temps.
A single A100 (even the least capable one - 400W with 40GB) should be more of a level of 3090Ti.
You also need to check memory usage (if it's on a limit - like 78.9/80 - there's a problem somewhere). Also don't exclude drivers.
Those are some common headaches when setting up remote GPU instances for DL...
qiltb t1_j3o7ull wrote
Reply to comment by soupstock123 in Building a 4x 3090 machine learning machine. Would love some feedback on my build. by soupstock123
I actually assumed you will be having 2 PSUs. For least problems, buy 2xAX1600i, for cheaper option buy 2xAX1200i. One PSU is actually the worst case, but yeah you can try with a single SFL 2000.
qiltb t1_j3mmcca wrote
Reply to comment by hjups22 in Building a 4x 3090 machine learning machine. Would love some feedback on my build. by soupstock123
Sorry, I referred explicitly to the the last paragraph of yours (that it's quick for small models)
qiltb t1_j3l9suz wrote
Reply to comment by hjups22 in Building a 4x 3090 machine learning machine. Would love some feedback on my build. by soupstock123
that also depends on input image size though...
qiltb t1_j3l9q8s wrote
Reply to comment by VinnyVeritas in Building a 4x 3090 machine learning machine. Would love some feedback on my build. by soupstock123
Well, in just the most basic tasks - like plain resnet100 training (classification) by using nvlink - there is a huge difference.
qiltb t1_j3l9hll wrote
Reply to comment by soupstock123 in Building a 4x 3090 machine learning machine. Would love some feedback on my build. by soupstock123
Under full load, AXi series is basically silent. But main reason is that PSU is not of high enough quality to actually sustain that load (even higher grade PSUs like EVGA P2 series has problems with infamous 3090 under DL task load) . Also take a look at my big comment on this reddit post.
qiltb t1_j3kjvki wrote
Reply to comment by rikonaka in Building a 4x 3090 machine learning machine. Would love some feedback on my build. by soupstock123
I actually works very well with ADD2PSU connector (used like 5 PSUs for one 14x3090 rig). He should actually think more of getting 1600W HIGH QUALITY PSU.
Corsair RM series IS NOT SUITABLE for workload you are looking for. Use preferrably AXi series or HXi if you really want to cheap out. We are talking about really abusing those PSUs. AX1600i is still unmatched for this usecase.
qiltb t1_j3uaop0 wrote
Reply to comment by Infamous_Age_7731 in Cloud VM GPU is much slower than my local GPU by Infamous_Age_7731
sorry, temperatures of GPU, CPU etc.