Viewing a single comment thread. View all comments

suflaj t1_iyh3csl wrote

Again, for the 1000th time, NVLink is not necessary for multi-GPU training.

You will not need 64 lanes for 4 GPUs because the 4090 doesn't have enough bandwidth to fill it up. 32 PCIE-4 or 16 PCIE-5 lanes will be enough. This just barely requires a threadripper since 4090s are still PCIE-4.

Your bigger issue is cooling. To have 4 GPUs you will need to water cool them with at least 2 radiators and you will need an especially large case to fit them.

But even if you do get the cooling, there is no way in hell you will find a consumer powersupply that can power those cards simultaneously, meaning you will need to spend several thousands of dollars getting an industrial-grade power supply for your server.

Overall it would be best to get a single or dual GPU setup and spend the rest of the money on A100 compute when you actually need it.

8

normie1990 OP t1_iyh422n wrote

Sorry if this has been asked a lot, I'm new to this sub.

As for case I'm going for corsair 680X, it has room for a 360mm and a 240mm radiator. I'm not sure if I should put a radiator on the bottom? If yes, then an additional 240mm.

−1

Ataru074 t1_iyhp40v wrote

As someone who actually built a system like that with the 3000 series.. yes, it can run crysis in 640*480 minimum settings.

You are looking at a 5 figures system when all said and done which will be worth half of it in one year or so.

That’s the equivalent of 400+ hours of training on the most expensive A100 cloud solution you can buy.

And that’s just for the bare metal. Add having to supply about 2.5kwh to keep such system running, 400 hours is a whole lot of time.

I never used my system for so much training, but hey… I can run crysis.

1

Dmytro_P t1_iyjw1xh wrote

400 hours is less than 3 weeks of training, if you plan to have the system loaded for at least half a year, building your own system may be quite a bit cheaper.

I have built a similar 3000 series system as well (with the reduced power limit to around 300W per GPU, the performance impact is not as large), renting for the time it was used would cost me significantly more.

2

normie1990 OP t1_iyi2cpn wrote

It will also be my main workstation for coding, playing games, etc, I just want it to do AI as well :)

1

Ataru074 t1_iyi2vrv wrote

You can get away with waaaaaaay less power than that.

1

normie1990 OP t1_iyi6fix wrote

Yes I think I will go with a Ryzen 9 platform with a single 4090 GPU. It's not very expandable like adding a ton of ram and multiple GPUs, but should be good enough for training detectron2 and yolo... I think. And cost way less than a threadripper platform.

1

suflaj t1_iyh4h4x wrote

No. The reason you put the radiator on top is so air doesn't fill up in the water block. Air in the water block means no cooling, since air barely conducts heat. Therefore you'd need a case big enough to mount both the radiators on top or keep one of them outside the case.

0

duschendestroyer t1_iyhi2tn wrote

You just need one rad or reservoir that's higher than the pump and blocks. This is really a non issue with custom loops.

1

suflaj t1_iyhi4z6 wrote

One radiator cannot handle 4 4090s, unless it's the size of at least 2 ordinary ones.

1

duschendestroyer t1_iyhj2sz wrote

sure, you want as many as you can get, but only one needs to be mounted high.

1

normie1990 OP t1_iyhja0s wrote

He meant that just one of the radiators needs to be higher

1