ChristmasInOct OP t1_jb2cwwf wrote on March 5, 2023 at 10:17 PM

Reply to comment by karyo in LLaMA model parallelization and server configuration by ChristmasInOct

Thanks for the response. Do you recall where you read the "only 200 people" bit? I'll take a look around for it as well; seems like the context could have found itself surrounded by interesting conversation.

P2P is not so much of a limitation so long as you can fit the entire model / pipeline into a single cards VRAM though, correct?

So for example, if you have a 7B Param model at FP16 and its around 14GB, presumably you should be safe with 24GB VRAM?

Thanks again for your time.

karyo t1_jb2qhcd wrote on March 5, 2023 at 11:58 PM

https://twitter.com/ericjang11/status/1627818245406461952?s=20

karyo t1_jb2qo4e wrote on March 5, 2023 at 11:59 PM

For inference?yes. Look at eleutherai transformer math page. Also others are trying out llama rn so check them out