Viewing a single comment thread. View all comments

ChristmasInOct OP t1_jb2cwwf wrote

Thanks for the response. Do you recall where you read the "only 200 people" bit? I'll take a look around for it as well; seems like the context could have found itself surrounded by interesting conversation.

P2P is not so much of a limitation so long as you can fit the entire model / pipeline into a single cards VRAM though, correct?

So for example, if you have a 7B Param model at FP16 and its around 14GB, presumably you should be safe with 24GB VRAM?

Thanks again for your time.

1

karyo t1_jb2qo4e wrote

For inference?yes. Look at eleutherai transformer math page. Also others are trying out llama rn so check them out

1