Submitted by TheButteryNoodle t3_zau0uc in deeplearning

Hey everyone. I'm building a new workstation for both personal and professional use and I need some help weighing the pros/cons of the different GPUs I'm looking at as well as general advice/recommendations.

Most of my professional work would fall within NLP and GNN models, however, I do occasionally dabble in image classifiers and stable diffusion as a hobby. The current GPUs that I was looking at are an RTX A6000 ADA, a used/refurbished A100 80GB (using PCIE instead of SXM4), or dual 4090s with a power limitation (I have a 1300watt PSU).

With the RTX A6000 ADA having 48gb of vram, it's definitely nice to be able to be able to load a whole new range of models that I wouldn't have been able to otherwise (without AWS or model parallelism), but it's harder to justify the current expected cost of $7,378-8210 when you could spend an additional $2-3k and get a used/refurbished A100 80GB GPU from ebay that provides almost double the vram and would likely outperform the new A6000 ADA card by a sizeable amount in FP16 and FP32 calculations.

However, you could also just get two RTX 4090s that would cost ~$4k and likely outperform the RTX 6000 ADA and be comparable to the A100 80GB in FP16 and FP32 calculations. The only consideration here is that I would need to change to a custom water-cooling setup as my current case wouldn't support two 4090s with their massive heatsinks (I'm assuming this change may cost in the range of $1.5-2k). Furthermore, I would likely need to put a power limitation on the GPUs with a 1300 watt PSU. The vram, at 24gb, would likely cover all of my professional use cases, but prevents me from loading in larger models without having to utilize model parallelism, which can be painful.

I also do like to play games casually. While this is not a major factor, it would be nice to not have to maintain two different rigs, with the A100 not really able to support games.

So, with all that being said, does it make sense to go for two 4090s, which would be ~4k plus a water cooling setup at ~1.5k, making it 5.5k in total? Or go for a RTX 6000 ADA at ~7.5-8k, which would likely have less computing power than 2 4090s, but make it easier to load in larger things to experiment with. Or just go for the end game with an A100 80gb at ~10k, but have a separate rig to maintain for games.

I do use AWS as well for model training for work. But with the recent AWS bills, my company has offered to pay a portion of the cost to build a new workstation. I will still be paying for most of the costs, but want to utilize the opportunity as personal PC upgrade. Any model training on AWS, that wouldn't be used for work, would obviously be billed to me (hence the interest in just getting a card with greater vram).

What do you all think makes the most sense here?

18

Comments

You must log in or register to comment.

computing_professor t1_iynllw4 wrote

I'm far from an expert but remember the 4090s are powerful but won't pool memory. I'm actually looking into a lighter setup than you with either an A6000 or, more likely, 2x 3090s with nvlink so I can get access to 48GB of vRAM. While the 4090 is much faster, you won't have access to as much vRAM. But if you can make do with 24GB and/or can parallelize your model, 2x 4090s would be awesome.

edit: Just re-read your post and I see I missed you mention parallelizing already. Still, if you can manage, 2x 4090 seems incredibly fast. I would do that if it was me, but I don't care much about computer vision.

4

TheButteryNoodle OP t1_iynr0g4 wrote

Hey there! Thanks for the response! I'm a little bit of a novice when it comes to how nvlink works, but wouldn't you still need to use model parallelization to fit a model over 24GB with 2x 3090s connected via nvlink? I thought they would effectively still show as two different devices, similar to 2 4090s; of course the benefit here being that the nvlink bridge directly connects the two gpus instead of going over pcie. Not too knowledgeable about this, so please feel free to correct me if I'm wrong!

1

Dexamph t1_iyo1ryt wrote

Looked into this last night and yeah, NVLink works the way you described because of misleading marketing- no contiguous memory pool, just a faster interconnect so maybe model parallelisation scales a bit better but you still have to implement it. Also saw an example where some PyTorch GPT2 models scaled horrifically in training with multiple PCIe V100s and 3090s that didn’t have NVLink so that’s a caveat with dual 4090s not having NVLink.

The RTX 6000 Ada lets you skip model sharding so that’s factored into the price. You lose the extra GPU so you have less throughput though.

You might be able to get away with leaving the 4090s at the stock 450W power limit since it seems the 3090/3090Ti transient spikes have been fixed.

I’m a bit skeptical about the refurb A100, like how would warranty work if it died one day? Did you consider how you’d cool it since it seems you have a standard desktop case while they were designed for rack mount servers with screaming loud fans hence the passive heatsink? Put thoughts and prayers that the little blower fan kits on eBay for ewasted Teslas are up to the task of cooling it?

3

TheButteryNoodle OP t1_iyo7url wrote

Right. Model parallelization was one of my concerns with any type of dual GPU setup as it can be a hassle at times and isn't always suitable for all models/use cases.

As for the A100, the general plan was to purchase a card that still has Nvidia's Manufacturer Warranty active (albeit that may be a bit tough at that price point). If there is any type of extended warranty that I could purchase, whether it's from Nvidia or a reputable third party, I would definitely be looking into those. In general, if the A100 was the route I would be going, there would be some level of protection purchased, even if it costs a little bit more.

As for the cooling, you're right... that is another pain point to consider. The case that I currently have is a fractal design torrent. In this case I have 2 180mm fans in the front, 3 140mm fans at the bottom, and then a 120mm exhaust fan at the back. I would hope that these fans alongside an initial blower fan setup would provide sufficient airflow. However, if it doesn't, I would likely move again to custom water cooling.

What I'm not sure though is how close the performance of the RTX 6000 ADA comes to an A100. If the performance difference isn't ridiculous for fp16 and fp32, then it would likely make sense to lean toward the 6000. Also, there is the fp8 performance for the 6000 with CUDA 12 being right around the corner.

2

Dexamph t1_iyocn1i wrote

I doubt the Torrent’s fans will do much if the blower isn’t enough because they were designed around a front to back air flow pathway with much, much higher static pressure to force air through the heatsink. We run V100s in Dell R740s on the local cluster and here’s how they sound to get the GPUs their needed airflow. So you might want to factor in the cost of custom loop water cooling into the A100 cost figure if things go south. And the spare rig as well so the true cost difference vs RTX 6000 Ada isn’t so close anymore.

I don’t know how the RTX 6000 Ada will really perform vs the A100 either because I haven’t seen the FP8 Transformer engine in action. Maybe it’ll skirt the halved memory bandwidth and land close to the A100, but the A100 delivers its performance today using today’s code.

3

TheButteryNoodle OP t1_iyy01wy wrote

Good point. I guess I'll just have to wait and see what the performance of the 6000 looks like. However, I think the decision is likely going to be just going with the 4090s. Thanks again for the insight!

1

computing_professor t1_iyo97p0 wrote

So this means you cannot access 48GB of vRAM with a pair of 3090s and nvlink, with TF and PyTorch? I could have sworn I've seen that it's possible. Not a deal breaker for me, but a bummer to be sure. I will likely end up with an a6000 instead, then, which isn't as fast but has that sweet vram.

2

Dexamph t1_iyoebd1 wrote

You certainly can if you put the time and effort into model parallelisation, just not in a seamless way where you get a single big memory pool needing no code changes or debugging to run larger models that wouldn’t fit on one GPU that I and many others were expecting. Notice how most published benchmarks with NVLink have only tested data parallel model training because it’s really straightforward?

3

computing_professor t1_iyokex9 wrote

Huh. If it requires parallelization then why is the 3090 singled out as the one consumer GeForce card that is capable of memory pooling? It just seems weird. What exactly is memory pooling then, that the 3090 is capable of? I'm clearly confused.

edit: I did find this from Puget that says

> For example, a system with 2x GeForce RTX 3090 GPUs would have 48GB of total VRAM

So it's possible to pool memory with a pair of 3090s. But I'm not sure how it's done in practice.

0

DingWrong t1_iyq0nr0 wrote

Big models get sharded and chunks get loaded on each gpu. There are a lot of frameworks ready for this as the big NLP models can't fit on a single gpu. Alpa even shards the model on different machines.

3

computing_professor t1_iyqaku8 wrote

Thanks. So it really isn't the same as how the Quadro cards share vram. That's really confusing.

1

Dexamph t1_izd1dy7 wrote

This is deadass wrong as that Puget statement was in the context of system memory, nothing to do with pooling: > How much RAM does machine learning and AI need?

>The first rule of thumb is to have at least double the amount of CPU memory as there is total GPU memory in the system. For example, a system with 2x GeForce RTX 3090 GPUs would have 48GB of total VRAM – so the system should be configured with 128GB (96GB would be double, but 128GB is usually the closest configurable amount).

1

LetMeGuessYourAlts t1_iyruft9 wrote

Do you know: Are there any Nvidia GPUs at a decent price/performance point that can pool memory? Every avenue I've looked down seems to point to nothing a hobbyist could afford being able to get a large amount of memory without resorting to old workstation GPUs that have relatively slow processors. Best bet seems to be a single 3090 if memory is the priority?

1

Dexamph t1_izd0gyf wrote

Technically they all can because it relies on software, it's just that NVLink will reduce the performance penalty going between GPUs. There is no free lunch here so you damn well better know what you're doing to not get stung like this guy by speculative bullshit pushed by people who never actually had to make it work.

With that out of the way, it doesn't get any better than ex-mining 3090s that start at ~$600. Don't bother with anything older because if your problem requires model parallelisation, than your time and effort is probably worth more than the pittance you save in trying to get some old 2080Tis or 2070 Supers to keep up.

1

computing_professor t1_iynwyu2 wrote

I think 2x 3090 will pool memory with nvlink, but not treat them as a single card. I think it depends on the software you're using. I'm pretty sure pytorch and tensorflow are able to take advantage of memory pooling. But the 3090 is the last GeForce card that will allow it. I hope somebody else comes into the thread with some examples of how to use it, because I can't seem to find any online.

1

suflaj t1_iyodcdi wrote

2x 4090 is the most money efficient if you have model parallelism for CV. For other tasks or vision transformers, it's probably bad because of low bandwidth.

The RTX A6000 will be better for deployment. If you're only planning on training your stuff this is a non-factor. Note that it has similar, even lower bandwidth than a 4090, so there are little benefits besides power consumption, non-FP32 performance and a bigger chunk of RAM.

So honestly it's between whether or not you want a local or cloud setup. Personally, I 'd go for 1x4090 and rest on compute. If there is something you can't run on 1x4090, the A100 compute will be both more money and time efficient.

3

TheButteryNoodle OP t1_iyy15ut wrote

Good points. I'd have to agree with you that the 4090s definitely do seem to be the most cost-efficient.

1

ShinyBike t1_j26jemn wrote

Having owned a 4090 and used many A100s, I can safely say that the 4090 is by far faster than an A100.

1

suflaj t1_j2841op wrote

You must've had some poorly optimized models then, as even the 40 GB A100 is roughly 2.0-2.1x faster than a 3090, while a 4090 is at most 1.9x but on average 1.5x faster than a 3090 according to various DL benchmarks.

1

mosalreddit t1_izr087o wrote

What mobo and case do you have to put 2 4090s in?

1

TheButteryNoodle OP t1_izskz4u wrote

Haven't purchased the motherboard yet, but the case would be a fractal design torrent. In order to get 2 4090s to fit you would need to go custom liquid cooling to get rid of the massive heatsinks on the 4090.

2

mosalreddit t1_izsyu2s wrote

Looking forward to seeing it when done.. Please do share the pictures

1

TheButteryNoodle OP t1_izt1ogh wrote

Will do!

1

computing_professor t1_izzi31s wrote

I am also interested! I'm going in circles trying to decide, and I think a 2x4090 would be the best for me, too. Though I'm more likely to have it built at MicroCenter to save myself the stress.

1

TheButteryNoodle OP t1_izzlzck wrote

Best of luck! 4090s at MSRP have been a challenge to find. Hopefully, supply will get better Q1 2023.

1

computing_professor t1_izznxpa wrote

I may do better going through a vendor, honestly. System76 doesn't do dual 4090s, but I think Exxact does.

1

mosalreddit t1_izwf74u wrote

what do you think of msi suprim or gigabyte waterforce performance/quality? both of these are 2 slots gpu with liquid cooling

1

TheButteryNoodle OP t1_izzldtv wrote

I think their performance is good! My concern would be finding places for all the radiators, such that the aio pump would be able to effectively do its job. It may also cramp the case if you decide to also water cool your CPU with another aio.

2