m98789 t1_iug9ma9 wrote on October 31, 2022 at 2:19 AM

I think the simplest approach is just to set up GPU-enabled VMs with your cloud providers auto-scale option (like scale sets), which can respond to http traffic “triggers” to create more or less of the same VMs in a pool.

When a VM comes online, it has an auto-start action to pull and run your container, joining the load balanced pool of workers.

As a starting point to learn more of this approach (Azure link, but they are all similar):

https://azure.microsoft.com/en-us/products/virtual-machine-scale-sets/#overview

I suggest VM as the simplest approach rather than your cloud provider’s serverless container instance infra because usually they lack or are limited in GPU support, or it is more experimental or complex. A VM approach is about as simple as it gets.