Viewing a single comment thread. View all comments

thundergolfer t1_ix9zysc wrote

> How can I deploy it so that its scalable ?

There's no such general thing as "scalability" (AKA magic scaling sauce). You'll have to be a lot more specific about how your deployment is not handling changes in load parameters.

If I had to guess, I'd say the likely scaling issue is going from a single VM with a single GPU to N GPUs able to run inference in parallel.

If that is your main scaling issue, modal.com can do serverless GPU training/inference against N GPUs almost trivially: twitter.com/charles_irl/status/1594732453809340416.

(disclaimer: work for modal)

2

Dense_History_1786 OP t1_ixa3ha2 wrote

sorry, should have been more clear.
but you are right, I have a single vm and thats the problem, I will checkout modal, thanks.

1

thundergolfer t1_ixalc2h wrote

If doesn't suit, lmk what didn't work well. Otherwise, I think other serverless GPU platforms will be your best bet. I don't think GCP do serverless GPUs and although AWS Sagemaker supports it their UX makes development a big pain.

1