Submitted by ChristmasInOct t3_11ium8l in deeplearning
karyo t1_jb03jq0 wrote
The first question is kinda difficult. Deep speed, zero, Megatron all play into it. There's a reason somebody recently said that there are only 200 people on the world atm that can pull it off.
For the second question ,
4090s just won't cut it. Nvidia fused off P2P this generation so unless you have an embarrassingly parallel pipeline ( which current llms aren't) they are not useful. Problem is ada a6000 was restricted severely P2P wise.
If you're doing llms at billion scale you gotta get v,a,h100s
ChristmasInOct OP t1_jb2cwwf wrote
Thanks for the response. Do you recall where you read the "only 200 people" bit? I'll take a look around for it as well; seems like the context could have found itself surrounded by interesting conversation.
P2P is not so much of a limitation so long as you can fit the entire model / pipeline into a single cards VRAM though, correct?
So for example, if you have a 7B Param model at FP16 and its around 14GB, presumably you should be safe with 24GB VRAM?
Thanks again for your time.
Viewing a single comment thread. View all comments