Submitted by QTQRQD t3_11jjd18 in MachineLearning
I'm looking to run some of the bigger models (LLaMA 30B, 65B, namely) on a cloud instance so that I can have some useable performance for completions. I was thinking of an EC2 instance with a single A100 attached, but is this the best setup, or does anyone have any other suggestions?
itsnotmeyou t1_jb2z8z7 wrote
Are you using these as in a system? For just experimenting around, ec2 is good option. But you would either need to install right drivers or use latest deep learning ami. Another option could be using a custom docker setup on sagemaker. I like that setup for inference as it’s super easy to deploy and separates model from inference code. Though it’s costlier and would be available through sagemaker runtime.
Third would be whole over engineering via setting up your own cluster service.
In general if you want to deploy multiple llm quickly go for sagemaker