thundergolfer

thundergolfer t1_izgiaa4 wrote

I'm sorry to shill, but Modal.com is easily the best thing for this. Here's a demo video should how fast you can edit code, run it in the cloud, and then edit it some more all in a handful of seconds.

I was the ML Platform lead at Canva and quick iteration was the #1 pain point of our data scientists and MLEs. I left Canva to join Modal because it can do heavy serverless compute and keep your inner dev loop tight.

Again, sorry to shill, but I've been in this sub for like 8 years and think tools like Modal and Metaflow are finally getting us to a place where ML development isn't a painful mess.

1

thundergolfer t1_iykjcgc wrote

How do you deploy and scale the transcription? Is it on GPU, which model variant?

I also built a whisper transcription app: modal.com/docs/guide/whisper-transcriber. It can do serverless CPU transcription on-demand. You can check it out and borrow from it if it's useful. The code is open-source.r

PS.: Yes this does violate rule 5 (promote on weekends). I violated the same rule when I posted my whisper app :)

2

thundergolfer t1_iyk3df7 wrote

Try modal.com.

Modal is an ML-focused serverless cloud, and much more general than replicate.com which just allows you to deploy ML model endpoints. But still extremely easy to use.

It's the platform that this openai/whisper podcast transcriber is built on: /r/MachineLearning/comments/ynz4m1/p_transcribe_any_podcast_episode_in_just_1_minute/.

Or here's an example of doing serverless batch inference: modal.com/docs/guide/ex/batch_inference_using_huggingface.

This example from Charles Frye runs Stable Diffusion Dream Studio on Modal: twitter.com/charles_irl/status/1594732453809340416

2

thundergolfer t1_ix9zysc wrote

> How can I deploy it so that its scalable ?

There's no such general thing as "scalability" (AKA magic scaling sauce). You'll have to be a lot more specific about how your deployment is not handling changes in load parameters.

If I had to guess, I'd say the likely scaling issue is going from a single VM with a single GPU to N GPUs able to run inference in parallel.

If that is your main scaling issue, modal.com can do serverless GPU training/inference against N GPUs almost trivially: twitter.com/charles_irl/status/1594732453809340416.

(disclaimer: work for modal)

2