Submitted by seattleite849 t3_10ryu6b in MachineLearning

Hi friends! I ran into this problem enough times at my last few jobs that I built a tool to solve it. I spent many hours building Docker containers for my Python functions, as many of the data science modules required building C libraries (since they significantly speed up compute-intensive routines, such as math calculations). Deploying the containers to AWS Lambda or Fargate (if the processes required more CPU or memory or were >15 minutes) and wiring functions to talk to each other using queues, databases, and blob storage made iterating on the actual code, which wasn't even that complex most of the time, slow.

I made cakework https://github.com/usecakework/cakework, a platform that lets you spin up your Python functions as serverless, production-scale backends with a single command. Using the client SDK, you submit requests, check status, and get results. You can also specify the amount of CPU (up to 16 cores) and memory (up to 128GB) for each individual request, which is helpful when your data size and complexity varies across different requests.

A common pattern that I built cakework for is doing file processing for ML:

- ingest data from some source daily, or in response to an external event (data written to blob storage)

- run my function (often using pandas/numpy/scipy)

- write results to storage, update database

- track failures and re-run/fix

It's open source <3. Here are some fun examples to get you started: https://docs.cakework.com/examples

Would love to hear your thoughts!

59

Comments

You must log in or register to comment.

BasilLimade t1_j6ykjaz wrote

I'm looking at making a docker image to host on AWS ECR, to contain some python code and dependencies (over 250MB of dependencies, so I can't just zip up my modules as a lambda "layer"). How does this compare to making my own docker lambda image?

4

seattleite849 OP t1_j6yt8w2 wrote

How are you wanting to trigger your function?

Also, here are some examples you can peek at: https://docs.cakework.com/examples

Under the hood, both Lambda and cakework are deploying Docker containers as microVMs running on bare metal instances. A few key differences:

- Lambda is a building block vs cakework is a custom, point solution for running async tasks. Meaning with Lambda, you will want to wire together other cloud resources to make it an application you can hit. This mix of code and infrastructure makes iterating quickly on your actual logic slow, in my experience, since you need to:

- Trigger the function (either exposing it via API Gateway if you'd like to invoke it using a REST call), or by hooking it up to an event (S3 PutObject, database update event).

- To hook up your function to other functions (for example, if you want to upload the final artifact to S3), you'll set up SQS queues. If you want to chain functions together, you'll set up Step Functions

- To track failures, store input/output params and results, and easily view logs, you would set up a database and write some scripts to trace the request via Cloudwatch logs.

- With Lambda, you manage creating and building the container yourself, as well as updating the Lambda function code. There are tools out there such as sst or serverless.com which help streamline this.

- With Cakework, you write your Python functions as plain code, then run a single command via the Cakework CLI to run `cakework deploy` which deploys your functions, exposes a public endpoint you can hit (either via REST calls, a Python SDK, or Javascript/Typescript SDK). The nice thing is you can directly test invoking your function as if it were code running on your local machine.

- No limits on the docker image size and no limit on how long your job can run for (vs 10 GB and 15 minute timeout for Lambda)

- You also specify CPU and memory parameters per request! So that you don't need to spin up a bigger instance than you actually need and pay that extra cost. Or provision not enough CPU or memory and 1) deal with failures, then 2) re-deploy your lambda with more compute.

3

Noddybear t1_j72kce3 wrote

Hey dude, this caught my eye before realising I spoke to you about this in person! I’ll have a play with it.

3

maxafrass t1_j7a6mgg wrote

Hello OP, This looks very intriguing. Would you say this is a direct replacement for Apache Airflow for simple compute jobs? I'm in the process of setting up Airflow for a fairly simple ETL job wherein I take 30gb of XML data, chunk it into discrete parts and farm out processing to multiple microvm's that will process the 30gb of XML in parallel. Is this something Cakewalk can do with less effort, or better than Airflow?

Also, are you guys planning to do a Youtube video with a walk-through of usage? I'd love to see it in action to get an initial feel for what this does.

2

swappybizz t1_j7345xf wrote

Stable diffusion?

1

seattleite849 OP t1_j734kwb wrote

Yup, that’s one of our examples! You can run this project to run a stable diffusion model on a serverless GPU: https://github.com/usecakework/cakework/tree/main/examples/image_generation

2

swappybizz t1_j735029 wrote

You have a sign up!

2

seattleite849 OP t1_j736n71 wrote

We are spinning up the serverless gpu hosting the model using banana.dev btw (which I’ve really liked so far). Cakework spins up CPU-only microVMs for now, since the Firecracker virtual machine monitor runs only on CPUs.

1

swappybizz t1_j736x59 wrote

Wow! How do you manage to say afloat?

1

seattleite849 OP t1_j737hp8 wrote

I got a bunch of credits from cloud hosting providers haha. Also since this is a beta I wanted a generous free tier. To connect with banana.dev, you would need to sign up for your own account and pass in your API key to the Python function that’s getting run on cakework. Thin

2