Viewing a single comment thread. View all comments

Final-Rush759 t1_jb5bxf5 wrote

Only 2×more than 3060. May be you are more power limited or CPU bottle necked when using both GPUs, or PCEi bandwidth limited.

1

incrediblediy t1_jb5dzqa wrote

This is when they were running individually on full 16x PCIE 4.0, can be expected with TFLOPS (3x) as well. (i.e. I have compared times when I had only 3060 vs 3090 on the same slot, running model on a single GPU each time)

I don't do much training on 3060 now, just connected to monitors etc.

I have changed the batch sizes to suit 24 GB anyway as I am working with CV data. Could be bit different with other types of models.

3060 = FP32 (float) 12.74 TFLOPS (https://www.techpowerup.com/gpu-specs/geforce-rtx-3060.c3682)
3090 = FP32 (float) 35.58 TFLOPS (https://www.techpowerup.com/gpu-specs/geforce-rtx-3090.c3622)

I must say 3060 is a wonderful card and helped me a lot until I found this ex-mining 3090. Really worth for the price with 12 GB VRAM.

1

Final-Rush759 t1_jb5f7eu wrote

I used mix precision training, should have been largely fp16. But you can input as float32. Pytorch amp will auto cast to fp16. I only get 2x speed more with 3090.

1