Submitted by Outrageous_Room_3167 t3_zpw2ew in deeplearning
Hey new to infrastructure builds, we're a small start-up https://www.axibo.com/ curious about what the biggest 3090 deep learning rigs people have done.
How do we scale past one machine? My guess is a very fast direct connection across the machines, is this feasible with the 3090s?
The cost of these has gone down dramatically & per unit basis, almost as good as A100.
sigmoid_amidst_relus t1_j0wsqyz wrote
3090 is not as good as an A100 in terms of pure performance.
It's much better than an A100 in perf/$
A single consumer-grade deep learning node won't scale past 3x 3090s without diminishing returns until and unless all you work with are datasets that fit in your memory or have a great storage solution. Top end prosumer and server grade platforms will do fine with up to 4-6x in a non-rack mounted setting, but not without custom cooling. The problem isn't just how well you can feed the gpus; 3090s are simply not designed to work at such high node densities like server end cards are. That's why companies are happy to pay pretty penny for A100s and other server grade cards (even if we ignore the need for certifications and Nvidia mandates): infrastructure and running costs of a good quality server facility far outweigh GPU costs and money lost to potential downtime.
Connecting multi-node setups is done through high bandwidth interconnects, like mellanox infiniband stuff.
Most mining farms don't run GPUs on full pcie x16 as mining isn't memory intensive, so you're not going to scale as well as that.
You can very well scale to 64x GPU "farm" easily, but it's going to be a pain in a consumer-grade only setup, esp in terms of interconnects and stuff, not to mention terribly space and cooling inefficient.