Submitted by lifesthateasy t3_y3n7u0 in MachineLearning

[DISCLAIMER bc of the negativity]: I will NOT architect our systems, we WILL hire architects. I just want to start learning the basics and the different options, so once the architects arrive, I'll have an understanding/have a common language with them. I think that's reasonable, as I have a background in CS and ML myself.

Hi, I've been hired as the first person of a future ML team at a company, and we're trying to get a feel for what ML Architecture we'd want to work with. I have no experience with architecture (and we will bring in an architect in eventually), but I'd like to get a better understanding of the concrete tech stacks that are to be used. And I really do mean tech, as I've read a bunch of theoretical articles about what the tasks are of such a system, I'm interested in the exact tech being used.

I'm aware of Azure, GCP and AWS offering their cloud-based ML platforms, but I was wondering where I could learn a bit more about the pros/cons of each (vs. maybe even a custom solution).

How would you go about architecting a modern MLOps pipeline? Does it make sense to mix and match providers (e.g. hosting KubeFlow on Azure and connecting to some AWS Lambdas - yeah I know my example doesn't exactly make sense).

Just to clarify, I'm not trying to put together the whole architecture myself, I'd just like to do some research and hear your opinions maybe on some of the providers.

81

Comments

You must log in or register to comment.

waronxmas t1_is9gxzz wrote

I will consult your company on ML Ops architecture at the rate of $1000/hr.

5

waronxmas t1_is9kwuk wrote

I’m being pithy, but the real answer is that it’s still a very frothy space and the specific tools you should choose is an extremely nuanced decision based on the specifics of your business problem, data characteristics, organizational processes, etc.. So if you’re at the point where ML really matters—it’s a committed investment for some production critical use-case—you should be looking to hire someone with hard won experience.

If you’re just getting started and looking for proof of concepts, you’re probably over-thinking it. Choose what is easiest to get up and running which means please do not adopt two cloud providers. Then if it goes well, don’t over-extend yourself on the prototype infrastructure and take a pause to evaluate the specific needs for productionization. That will either be a good jumping off point to dive-deep on a few specific ML ops solutions out of the literal billions of garbage products out there that aren’t worth learning about. Or you might hire someone to point you at the right things.

13

chief167 t1_is9lh2u wrote

I will give you free advice, for once: don't trust any of the online simple architecture articles and take free advice. It will cost you much more in the end

Designing an architecture for your company is a multi week project, with lots of nuances and decisions. There is no best option, and especially without your business context it's literally impossible to recommend something good. From your question, it's clear your architecture expertise is very low

Do yourself a favour and get an architect, preferably from a dedicated local data specific consulting company and not a big box all-rounder like Accenture or TCS. Expect day rates of 1200-1500 if it's a short term project.

What were you expecting, someone to say 'just use Google cloud', and just go with it?

Don't fool yourself, if you are not ready to build an architecture, don't. Youll have to start over next year. You were hired for the wrong job then

−7

chief167 t1_is9lmou wrote

Would you ask a doctor for a few online resources on your medical treatment? You are completely disrespectful towards the architecture profession if you think a few online resources are all you need to get started, and that you'll just fix it in the future by hiring an architect to clean up your mess.

−7

amigo213a t1_is9mxyh wrote

I do MLOps on daily basis and have setup something from scratch in my company. The only thing I have to say is that, the most popular ones available out there are not going to help. Take Kubeflow for example, you need to hands on experience with Kubernetes to be able to setup good workflows/pipelines but most of the users who would be working are your Data Scientists/Machine Learning experts who wouldn't be any expert. They hardly build solution that scales as well. So it comes down to the MLOps platform to be able to meet with their weird requirements.

Choosing Kubernetes is a good started point, it lets you scale out, run workloads in isolation and many great things. Either you could setup your own infra in the company or choose one of the managed clusters from AWS/GCP/Azure depending on their pricing. Only good thing about cloud providers is that you don't need to take care of the infra on your side. Like for e.g, if you want to spin-off your own Text-to-Image service then you could easily containerize push the solution onto different region based kubernetes cluster on AWS or other cloud. You can easily get CDN for scaling the serving based on regions easily on AWS.

12

lifesthateasy OP t1_is9nsdz wrote

As I said, I'm not going to be the one building the whole system. We'll hire someone for that. I'd just like to get basic knowledge to even be able to tell what skills to hire for/how to test candidates and what architectures are available.

0

lifesthateasy OP t1_is9nxra wrote

I specifically said in the post we'll hire an architect. I just want to build my knowledge to not go into it fully blindly.

I was expecting people to point me towards specific learning resources, as some people did.

My job is not to build the architecture.

4

lifesthateasy OP t1_is9o4os wrote

Yes I am leaning towards a kubernetes/kubeflow setup, and I'll definitely be working with architects to get it set up properly. I just want to make sure I have some knowledge about how to approach such a thing, to be able to decide if what we're doing is very wrong or just a little :)

4

lifesthateasy OP t1_is9oal8 wrote

And you are completely disrespectful towards me when you don't read the full post - where I specifically mention we'll be hiring architects - before commenting. Also, architects had to learn their respective professions, and I can learn it, too, I was just asking for resources where to learn it. Again, just so I know what it's about, so once the architects arrive, I can speak a common language with them.

9

vicks9880 t1_is9zqxu wrote

MLOps is no different than DevOps, however there are more things on top of DevOps that you need to consider in MLOps. There is no one tutorial anyone can point you to, as everyone mentioned it totally depends on the use case and the infrastructure you want to build. You will get better idea if you just find online resources to learn individual pieces, like kubernetes, Jenkins, kubeflow, airflow, mlflow, metaflow, kedro and aws related basic stuff etc. once you have the clear idea of how each tool works and what they offer, you can piece together your entire MLOps infrastucture based on your need.

3

emotional_nerd_ t1_isa692t wrote

The author of the previous post is right. I get what you mean, you are making an effort to gather adequate knowledge to lead a great team. Here's the trouble, adequate knowledge of the tools will likely not help your commerce while making decisions.

2

lifesthateasy OP t1_isab09z wrote

That's fine and all. But I'll be miles ahead if they say "hey today I'm gonna configure the istio gateways so that you can connect remotely to the dex auth service" and I'll be able to understand it.

6

_thawnos t1_isamt8u wrote

Very timely, I am in the exact same position right now :)

2

jcoffi t1_isb4tja wrote

If you're hiring someone because you don't have the nuanced experience, it doesn't take sense to lean towards anything. Let the expert decide. But be well informed.

2

phb07jm t1_isbst7m wrote

I'm going through a similar process, but with an established team. I'm also working on a large company with a big technology arm so have the support of a decent data architecture team, and this is still a tricky question to navigate. Here's where my thinking is currently at, would love to hear alternative views.

  1. Don't reinvent the wheel. There are many ways to be right here but building your own in-house MLOps platform is nuts at this stage (unless perhaps you plan to sell access to it - i.e you're an ML consultancy).

  2. Use industry standard tools. I'd need to hear a pretty good argument to adopt a niche platform. Standard tools make it easier for new recruits to hit the ground running, and helps with retention.

  3. The big players are all viable options, but may be stronger/weaker candidates for you, depending on what matters. I.e. model cataloguing and governance, AutoML, data-wrangling, monitoring of deployed solutions, experiment tracking and model lineage...

  4. It matters what kind of ML you want to do. I.e. will you be doing scalable, low latency live inference, or are you mostly going to be doing lots of batch processes and descriptive modelling. Are you going to be building mostly bespoke/novel algorithms, or do you want access to a lot of pre-trained models and plug and play algorithms...

  5. Based on the above, what skills do you need in the team. Is no code/low code relevant for you? Do you need data migration tools built in...

  6. There are a good open-source solutions for many parts of the MLOps cycle (feature store, labeling, experiment tracking etc).

TLDR: start by thinking about what products you'll build, then think about the skills you'll need in the team, then review a bunch of options with someone who knows data architecture, and then pick the one(s) that make most sense for you.

7

hillsboro97124 t1_isbx65r wrote

Wow, so much useful information in this thread! Really impressed by everyone being professional and helpful in such a small sub!

4

JimmyTheCrossEyedDog t1_iscsgo6 wrote

> I can provide consulting

Your wording makes it sound like you are advertising your services for pay. It sounds like that was not your intent, but that's how people interpreted it and downvoted you for what seemed like self-promotion.

2

thebeastlymess t1_isd1wgv wrote

Definitely don't try and build this yourself. There are some platforms that work well. Things like cnvrg.io and dominodatalab already have a lot of mlops functionality built in. Otherwise, you are going to go down the path of stringing together many open source resources together.

2

Inscribed t1_isddfa6 wrote

My team has switched elements of our tech stack several times. Do not get too tied to any specific technology, and focus on getting a solution to production. The MLOps space is changing too fast to keep up. Ensure your solution can scale, but also ensure it can adapt to a dynamic environment. Oh and data: 80% of your time will be spent gathering, versioning controlling, exploring and serving data. A good model is useless unless it is in production and production models need good data.

3

seiqooq t1_isdozrp wrote

Was going to make basically this exact post today. Thanks for taking the hit :^) lots of good stuff here.

2

machineko t1_isdv3te wrote

I agree with this comment. Back when the tools were crappy, it might've been better to build from scratch but with many good tools available now (often giving you better performance than building them on your own and also cheaper), you should at least try them. Especially if you are interested in running deep learning.

There are mlops sw for:
- low latency inference

- training large language models

- explainable ml

and more.

2

Rarc1111 t1_isdw8qg wrote

"They hardly build solution that scales as well. So it comes down to the MLOps platform to be able to meet with their weird requirements."

This.

MLOps is not the bottleneck, go with something as simple as possible, as you will be spending most of your time pretending you are not rewriting their entire code.

4