Dexamph

Dexamph t1_izd1dy7 wrote

This is deadass wrong as that Puget statement was in the context of system memory, nothing to do with pooling: > How much RAM does machine learning and AI need?

>The first rule of thumb is to have at least double the amount of CPU memory as there is total GPU memory in the system. For example, a system with 2x GeForce RTX 3090 GPUs would have 48GB of total VRAM – so the system should be configured with 128GB (96GB would be double, but 128GB is usually the closest configurable amount).

1

Dexamph t1_izd0gyf wrote

Technically they all can because it relies on software, it's just that NVLink will reduce the performance penalty going between GPUs. There is no free lunch here so you damn well better know what you're doing to not get stung like this guy by speculative bullshit pushed by people who never actually had to make it work.

With that out of the way, it doesn't get any better than ex-mining 3090s that start at ~$600. Don't bother with anything older because if your problem requires model parallelisation, than your time and effort is probably worth more than the pittance you save in trying to get some old 2080Tis or 2070 Supers to keep up.

1

Dexamph t1_iyoebd1 wrote

You certainly can if you put the time and effort into model parallelisation, just not in a seamless way where you get a single big memory pool needing no code changes or debugging to run larger models that wouldn’t fit on one GPU that I and many others were expecting. Notice how most published benchmarks with NVLink have only tested data parallel model training because it’s really straightforward?

3

Dexamph t1_iyocn1i wrote

I doubt the Torrent’s fans will do much if the blower isn’t enough because they were designed around a front to back air flow pathway with much, much higher static pressure to force air through the heatsink. We run V100s in Dell R740s on the local cluster and here’s how they sound to get the GPUs their needed airflow. So you might want to factor in the cost of custom loop water cooling into the A100 cost figure if things go south. And the spare rig as well so the true cost difference vs RTX 6000 Ada isn’t so close anymore.

I don’t know how the RTX 6000 Ada will really perform vs the A100 either because I haven’t seen the FP8 Transformer engine in action. Maybe it’ll skirt the halved memory bandwidth and land close to the A100, but the A100 delivers its performance today using today’s code.

3

Dexamph t1_iyo1ryt wrote

Looked into this last night and yeah, NVLink works the way you described because of misleading marketing- no contiguous memory pool, just a faster interconnect so maybe model parallelisation scales a bit better but you still have to implement it. Also saw an example where some PyTorch GPT2 models scaled horrifically in training with multiple PCIe V100s and 3090s that didn’t have NVLink so that’s a caveat with dual 4090s not having NVLink.

The RTX 6000 Ada lets you skip model sharding so that’s factored into the price. You lose the extra GPU so you have less throughput though.

You might be able to get away with leaving the 4090s at the stock 450W power limit since it seems the 3090/3090Ti transient spikes have been fixed.

I’m a bit skeptical about the refurb A100, like how would warranty work if it died one day? Did you consider how you’d cool it since it seems you have a standard desktop case while they were designed for rack mount servers with screaming loud fans hence the passive heatsink? Put thoughts and prayers that the little blower fan kits on eBay for ewasted Teslas are up to the task of cooling it?

3

Dexamph t1_ix7onhf wrote

Reply to comment by Star-Bandit in GPU QUESTION by Nerveregenerator

I think you way overestimated K80 performance when my 4GB GTX 960 back in the day could trade blows with a K40, which was a bit more than half a K80. In a straight memory bandwidth fight, like Transformer model training, the 1080Ti is going to win hands down even if you have perfect scaling acorss both GPUs on the K80 and that's assuming it doesn't get hamstrung by the ancient Kepler architecture in any way at all.

2