Submitted by TheButteryNoodle t3_zau0uc in deeplearning
suflaj t1_iyodcdi wrote
2x 4090 is the most money efficient if you have model parallelism for CV. For other tasks or vision transformers, it's probably bad because of low bandwidth.
The RTX A6000 will be better for deployment. If you're only planning on training your stuff this is a non-factor. Note that it has similar, even lower bandwidth than a 4090, so there are little benefits besides power consumption, non-FP32 performance and a bigger chunk of RAM.
So honestly it's between whether or not you want a local or cloud setup. Personally, I 'd go for 1x4090 and rest on compute. If there is something you can't run on 1x4090, the A100 compute will be both more money and time efficient.
TheButteryNoodle OP t1_iyy15ut wrote
Good points. I'd have to agree with you that the 4090s definitely do seem to be the most cost-efficient.
ShinyBike t1_j26jemn wrote
Having owned a 4090 and used many A100s, I can safely say that the 4090 is by far faster than an A100.
suflaj t1_j2841op wrote
You must've had some poorly optimized models then, as even the 40 GB A100 is roughly 2.0-2.1x faster than a 3090, while a 4090 is at most 1.9x but on average 1.5x faster than a 3090 according to various DL benchmarks.
Viewing a single comment thread. View all comments