suflaj t1_iyodcdi wrote on December 2, 2022 at 10:08 PM

2x 4090 is the most money efficient if you have model parallelism for CV. For other tasks or vision transformers, it's probably bad because of low bandwidth.

The RTX A6000 will be better for deployment. If you're only planning on training your stuff this is a non-factor. Note that it has similar, even lower bandwidth than a 4090, so there are little benefits besides power consumption, non-FP32 performance and a bigger chunk of RAM.

So honestly it's between whether or not you want a local or cloud setup. Personally, I 'd go for 1x4090 and rest on compute. If there is something you can't run on 1x4090, the A100 compute will be both more money and time efficient.

TheButteryNoodle OP t1_iyy15ut wrote on December 5, 2022 at 12:52 AM

Good points. I'd have to agree with you that the 4090s definitely do seem to be the most cost-efficient.

ShinyBike t1_j26jemn wrote on December 30, 2022 at 12:37 AM

Having owned a 4090 and used many A100s, I can safely say that the 4090 is by far faster than an A100.

suflaj t1_j2841op wrote on December 30, 2022 at 9:11 AM

You must've had some poorly optimized models then, as even the 40 GB A100 is roughly 2.0-2.1x faster than a 3090, while a 4090 is at most 1.9x but on average 1.5x faster than a 3090 according to various DL benchmarks.