If a task requires a certain amount of available memory on a gpu, and there are two gpus who cannot individually run the task, will the memory of each be combined if ran together? Does it work like that? Or does each gpu have to be capable memory-wise on its own?

Comments

You must log in or register to comment.

FuB4R32 t1_iu41a65 wrote on October 28, 2022 at 12:05 PM

Is this for training or inference? The easiest thing to do is to split up the batch size between multiple GPUs. If you can't even fit batch=1 on a single GPU though then model parallelism is generally a harder problem

sabeansauce OP t1_iu45f4w wrote on October 28, 2022 at 12:42 PM

for training. Essentially I have to choose between one powerful gpu or multiple average ones. But I know that the average ones on their own don't have enough space (because i have one) for the task at hand. I prefer the one gpu but company is asking if a multi-gpu setup of lesser capabilities will also work if used together.

FuB4R32 t1_iu45yrl wrote on October 28, 2022 at 12:47 PM

Yeah I think I understand, e.g. Google cloud has a great deal on K80 especially if you commit to the costs up front. If you have even a handful of mid GPUs it should be faster training anyway since you can achieve a large batch size, but it depends on the details ofc

dafoshiznit t1_iu3oznc wrote on October 28, 2022 at 9:42 AM

I have no idea how I got here

sabeansauce OP t1_iu3s2p9 wrote on October 28, 2022 at 10:24 AM

glad to have you still

dafoshiznit t1_iu4lg1t wrote on October 28, 2022 at 2:43 PM

Thank you sir. I'm going to embark on an adventure to learn everything I can about deep learning to answer your question.

nutpeabutter t1_iu3v2bd wrote on October 28, 2022 at 11:00 AM

There is currently no easy way of pooling vram. If the model can't fit onto vram I suggest you check out https://huggingface.co/transformers/v4.9.2/parallelism.html#tensor-parallelism.

sabeansauce OP t1_iu45vee wrote on October 28, 2022 at 12:46 PM

that is a good intro on the topic I bookmarked the paper they referenced. Good to know I have this in the toolbox thank you

Long_Two_6176 t1_iu5omuu wrote on October 28, 2022 at 7:05 PM

This is called model parallelism. Think of this as having model.conv1 on gpu1 and model.conv2 on gpu2. This is actually not too hard to do as you just need to manually specify your model components with statements like .to(“cuda:”). Start with this.

A more advanced model is model parallelism + data parallelism where you can benefit from having both gpus split the dataset to accelerate the training. Typically this is not possible with simple model parallelism, but an advanced model like fairseq can do it for you.

the_hackelle t1_iu5qdub wrote on October 28, 2022 at 7:17 PM

Also because it's super user friendly and easy to implement, have a look at pytorch lightning. They make distributing and such very easy

sabeansauce OP t1_iu67f3z wrote on October 28, 2022 at 9:16 PM

woah that is a cool project.

sabeansauce OP t1_iu676zn wrote on October 28, 2022 at 9:14 PM

okay I could see how I was thinking about it kind of wrong. Thanks for the reply

Melodic-Scallion-416 t1_iu4s1r8 wrote on October 28, 2022 at 3:28 PM

I have been reading articles about OpenAI Triton and how that helps to optimize memory usage during GPU processing. I have not used it personally but was planning to try it out, to address this same concern.

sabeansauce OP t1_iu66wx3 wrote on October 28, 2022 at 9:12 PM

that could definitely be one solution