Submitted by soupstock123 t3_106zlpz in deeplearning
Hi there, I'm building a machine for deep learning and ml work, and I wanted to critique advice on my build. The target is 4 3090s, but I'm just trying to decide the cpu and the motherboard rn. There are a few other options I considered and these were my thoughts on each. Let me know if there's some flaw in my thinking
- amd thread ripper 3:
- expensive chip and mobo
- end of life already and prices still haven't gone down much on these lol
- 64 PCIe4 lanes so def enough lanes
- Intel i9 10980xe and a x299 motherboard
- 48 PCIe3 lanes, enough for 4 gpus
- kinda old and slight premium for x299 chipset
In the end, I decided to do this build: https://ca.pcpartpicker.com/list/Vmyvtn
https://ca.pcpartpicker.com/list/vGkhwc
~~- AMD Ryzen 9 7950X:
- 16 PCIe5 lanes,
- with 4 gpus that's 4 PCIe4 lanes per gpu~~
I'm wondering what's your opinion on my build, yes, there are only 16 lanes, but they're PCIe5, and 4 lanes of PCIe5 equals 8 lanes of PCIe4, so in theory, it should be fine right? For the case, I'm planning on just using a mining rig frame and putting everything there for now. Future plans would be to waterblock everything and have a nice case.
Edit: After reviewing some of the comments, I've decided to get a threadripper 3960X, and an ASRock TRX40 Creator for the mobo.
Also, question about ram speed, the ASRock Creator can support DDR4 ram speeds up to 4666, is there a need to go that high? I'm planning to go to 128GB of RAM, and higher speeds are definitely more expensive. Is there a sweet spot of cost/perf or does ram speed not even matter for deep learning?
Somethings I learned: Check the bifurcation/division of lanes on the PCIe ports on the mobo, even if the processor has enough lanes, the mobo might not split them ideally.
hjups22 t1_j3k2kei wrote
What is the intended use case for the GPUs? I presume you intend to train networks, but which kind and at what scale? Many small models, or one big model at a time?
Or if you are doing inference, what types of models do you intend to run.
The configuration you suggested is really only good for training / inferencing many small models in parallel, and will not be performant for anything that uses more than 2 GPUs via NVLink.
Also don't forget about system RAM... depending on the models, you may need ~1.5x the total VRAM capacity in system RAM, and deepspeed requires a lot more than that (upwards of 4x) - I would probably go with at least 128GB for the setup you described.