I'm looking to run some of the bigger models (LLaMA 30B, 65B, namely) on a cloud instance so that I can have some useable performance for completions. I was thinking of an EC2 instance with a single A100 attached, but is this the best setup, or does anyone have any other suggestions?

Comments

You must log in or register to comment.

itsnotmeyou t1_jb2z8z7 wrote on March 6, 2023 at 1:08 AM

#2,169,024

Are you using these as in a system? For just experimenting around, ec2 is good option. But you would either need to install right drivers or use latest deep learning ami. Another option could be using a custom docker setup on sagemaker. I like that setup for inference as it’s super easy to deploy and separates model from inference code. Though it’s costlier and would be available through sagemaker runtime.

Third would be whole over engineering via setting up your own cluster service.

In general if you want to deploy multiple llm quickly go for sagemaker

itsnotmeyou t1_jb2zfbq wrote on March 6, 2023 at 1:09 AM

#2,169,033

Replying to itsnotmeyou (#2,169,024)

On a side note sagemaker was not supporting shm-size so might not work for large lm

I_will_delete_myself t1_jb33bmz wrote on March 6, 2023 at 1:41 AM

#2,169,210

Use a spot instance. If you testing it out you wallet will thank you later. Look at my previous post on here about running stuff in the cloud before you do it.

[deleted] t1_jb3t91m wrote on March 6, 2023 at 5:27 AM

#2,170,465

[removed]

pyonsu2 t1_jb3y5ps wrote on March 6, 2023 at 6:23 AM

#2,170,632

maybe, Colab Pro+?

iloveintuition t1_jb4bp1o wrote on March 6, 2023 at 9:28 AM

#2,171,099

Using vast.ai for running flan-xl, works pretty well. Haven't tested on LLama scale.

isaeef t1_jb4iz4f wrote on March 6, 2023 at 11:15 AM

#2,171,329

or you could use any gpu workload specific provider https://www.paperspace.com/

ggf31416 t1_jb4j0uk wrote on March 6, 2023 at 11:15 AM

#2,171,332

Good luck getting a EC2 with a single A100, last time I checked, AWS only offered instances with 8 of them at a high price.

l0g1cs t1_jb4tbu7 wrote on March 6, 2023 at 1:10 PM

#2,171,728

Check out Banana. They seem to do exactly that with "serverless" A100.

shayanrc t1_jb4vbcz wrote on March 6, 2023 at 1:28 PM

#2,171,829

Replying to iloveintuition (#2,171,099)

What config did you use?

trnka t1_jb5f89k wrote on March 6, 2023 at 3:58 PM

#2,172,809

Related, there's a talk on Thursday about running LLMs in production. I think the hosts have deployed LLMs in prod so they should have good advice

frankod281 t1_jb5j6si wrote on March 6, 2023 at 4:24 PM

#2,172,986

Maybe check datacrunch.io they have a good offering for cloud GPU.

Mrkvitko t1_jb6gf6c wrote on March 6, 2023 at 8:09 PM

#2,174,410

I just got instance at 8X RTX A5000 for a couple of bucks per hour. on https://vast.ai

I must say LLaMA 65B is a bit underwhelming...

maizeq t1_jb90rkr wrote on March 7, 2023 at 9:43 AM

#2,177,344

Replying to Mrkvitko (#2,174,410)

Underwhelming how?

Quick-Hovercraft-997 t1_jbx9gcj wrote on March 12, 2023 at 12:52 PM

#2,213,639

if latency is not a critical requirement, you can try serverless GPU cloud like banana.dev, pipeline.ai . These platform provide an easy to use template for deploying LLM.