Viewing a single comment thread. View all comments

machineko t1_ixzkdbt wrote

AWS Lambda provides serverless but you do not need serverless to make something scalable, if you are referring to scaling from single to multiple GPUs as your workload grows.

The simplest method is to containerize your application and use auto-scaling from GCP. You can also auto-scale it on Kubernetes. Alternatively, you can use services like stochastic.ai which deploys your model containerized and provides auto-scaling out of the box. You just need to upload your model and deploy.

However, I suggest you "accelerate" your inference first. For example, you can use open-source inference engines (see: https://github.com/stochasticai/x-stable-diffusion) to easily accelerate your inference 2x or more. That means you can generates 2x more images / $ on public clouds.

1