Viewing a single comment thread. View all comments

TheButteryNoodle OP t1_iyo7url wrote

Right. Model parallelization was one of my concerns with any type of dual GPU setup as it can be a hassle at times and isn't always suitable for all models/use cases.

As for the A100, the general plan was to purchase a card that still has Nvidia's Manufacturer Warranty active (albeit that may be a bit tough at that price point). If there is any type of extended warranty that I could purchase, whether it's from Nvidia or a reputable third party, I would definitely be looking into those. In general, if the A100 was the route I would be going, there would be some level of protection purchased, even if it costs a little bit more.

As for the cooling, you're right... that is another pain point to consider. The case that I currently have is a fractal design torrent. In this case I have 2 180mm fans in the front, 3 140mm fans at the bottom, and then a 120mm exhaust fan at the back. I would hope that these fans alongside an initial blower fan setup would provide sufficient airflow. However, if it doesn't, I would likely move again to custom water cooling.

What I'm not sure though is how close the performance of the RTX 6000 ADA comes to an A100. If the performance difference isn't ridiculous for fp16 and fp32, then it would likely make sense to lean toward the 6000. Also, there is the fp8 performance for the 6000 with CUDA 12 being right around the corner.

2

Dexamph t1_iyocn1i wrote

I doubt the Torrent’s fans will do much if the blower isn’t enough because they were designed around a front to back air flow pathway with much, much higher static pressure to force air through the heatsink. We run V100s in Dell R740s on the local cluster and here’s how they sound to get the GPUs their needed airflow. So you might want to factor in the cost of custom loop water cooling into the A100 cost figure if things go south. And the spare rig as well so the true cost difference vs RTX 6000 Ada isn’t so close anymore.

I don’t know how the RTX 6000 Ada will really perform vs the A100 either because I haven’t seen the FP8 Transformer engine in action. Maybe it’ll skirt the halved memory bandwidth and land close to the A100, but the A100 delivers its performance today using today’s code.

3

TheButteryNoodle OP t1_iyy01wy wrote

Good point. I guess I'll just have to wait and see what the performance of the 6000 looks like. However, I think the decision is likely going to be just going with the 4090s. Thanks again for the insight!

1