ggf31416 t1_j99y9e1 wrote on February 20, 2023 at 11:15 AM

https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/

How much VRAM you need will depend mostly on the number of parameters of the model with some extra for the data. At FP32 precision each parameter needs 4 bytes, at FP16 or BF16 2 bytes, and at FP8 or INT8 only one byte. Almost all models can be run at FP16 without noticeable accuracy loss, FP8 sometimes works, sometimes it doesn't depending on the model.

DevarshTare OP t1_j9a7gap wrote on February 20, 2023 at 1:02 PM

Appreciate it! This gave me a better picture. I was stuck between 3060 ti and 3070. In this case 3060 ti is the logical option. I will be using the Colab for training it, and can probably optimise it to run with 8 Gb, if I'm not wrong?

ggf31416 t1_j9a8p88 wrote on February 20, 2023 at 1:14 PM

3070 and 3060ti both have 8GB, and while the 3070 will be a bit faster, most people will agree that the difference is not worth the price if you have a tight budget.

For training the extra 4GB from the plain 3060 is quite useful, but for inference only you can run most small and medium models (such as stable diffusion) in 8GB and the 3060ti will be faster.

TruthAndDiscipline t1_j9avex4 wrote on February 20, 2023 at 4:09 PM

I'm using a 3060 (no ti) with 12GB VRAM and train locally as well. Performance is fine, too.

DevarshTare OP t1_j9b4u7s wrote on February 20, 2023 at 5:12 PM

Thats interesting. I was considering that purchase since it makes sense to run larger datasets or models on the rtx 3060. But the Tensor cores were significantly lower. The GPU would definitely run much larger models but at a lower speed I assume?

How has your experience been with larger models? Especially video or image based models ?

ggf31416 t1_j9clwen wrote on February 20, 2023 at 10:59 PM

I actually have a 3060 too, in theory a 3060ti should be up to 30% faster, but most of the times the 3060 is fast enough and faster than any T4.

For making a few images on stable diffusion maybe the difference will be 15 vs 20 seconds, for running whisper on several hours of audio it could be 45 minutes vs 1 hour. The difference will only matter if the model is optimized to fully use the GPU in the first place.

DevarshTare OP t1_j9ngofc wrote on February 23, 2023 at 5:45 AM

I've seen the same across multiple threads now, the VRAM does make a difference in being able to run a model or having to optimize it. This has been really helpful, thanks a lot guys!