Submitted by QTQRQD t3_11jjd18 in MachineLearning

I'm looking to run some of the bigger models (LLaMA 30B, 65B, namely) on a cloud instance so that I can have some useable performance for completions. I was thinking of an EC2 instance with a single A100 attached, but is this the best setup, or does anyone have any other suggestions?

27

Comments

You must log in or register to comment.

itsnotmeyou t1_jb2z8z7 wrote

Are you using these as in a system? For just experimenting around, ec2 is good option. But you would either need to install right drivers or use latest deep learning ami. Another option could be using a custom docker setup on sagemaker. I like that setup for inference as it’s super easy to deploy and separates model from inference code. Though it’s costlier and would be available through sagemaker runtime.

Third would be whole over engineering via setting up your own cluster service.

In general if you want to deploy multiple llm quickly go for sagemaker

0

I_will_delete_myself t1_jb33bmz wrote

Use a spot instance. If you testing it out you wallet will thank you later. Look at my previous post on here about running stuff in the cloud before you do it.

8

iloveintuition t1_jb4bp1o wrote

Using vast.ai for running flan-xl, works pretty well. Haven't tested on LLama scale.

2

ggf31416 t1_jb4j0uk wrote

Good luck getting a EC2 with a single A100, last time I checked, AWS only offered instances with 8 of them at a high price.

1

l0g1cs t1_jb4tbu7 wrote

Check out Banana. They seem to do exactly that with "serverless" A100.

1

frankod281 t1_jb5j6si wrote

Maybe check datacrunch.io they have a good offering for cloud GPU.

1

Mrkvitko t1_jb6gf6c wrote

I just got instance at 8X RTX A5000 for a couple of bucks per hour. on https://vast.ai

I must say LLaMA 65B is a bit underwhelming...

6

Quick-Hovercraft-997 t1_jbx9gcj wrote

if latency is not a critical requirement, you can try serverless GPU cloud like banana.dev, pipeline.ai . These platform provide an easy to use template for deploying LLM.

1