Viewing a single comment thread. View all comments

ThomasBudd93 t1_j7u0ttw wrote

Someone else directed me here:

https://discuss.pytorch.org/t/ddp-training-on-rtx-4090-ada-cu118/168366/12

I didn't read the whole thread and there is more in the NVIDIA forum (links can be found in the pytorch forum link from above). At least a few weeks ago it looked like the multi-GPU training for the RTX 4090s doesn't work fully where it does for the RTX 6000 Ada. Not sure if this is intended or just a bug. I called a company here in Germany and they even stopped selling multi RTX 4090 deep learning computers because of this. I asked them about the multi-GPU benchmark I saw from lambda labs and they replied that they reproduced that but saw that the training only resulted in nans. This is all I know. If you find out more, could you share it here? :) Thanks!

2

one_eyed_sphinx OP t1_j7yr8st wrote

some of the people seem to connect it to the AMD processors and motherboards. do you think it's the reason?
nvidia is known to downgrade thier gaming GPU so people will buy the proffessional ones.

1