As per the title I wrote a book called "Managing Machine Learning", it's available as an e-book (https://www.manning.com/books/managing-machine-learning-projects). Here's a blog post about the book: https://medium.com/@sgt101/does-the-world-need-yet-another-book-on-machine-learning-ml-ff22f8954d33

I'd be happy to discuss if anyone has any questions or thoughts about it.

The process documented in Managing Machine Learning Projects

Comments

You must log in or register to comment.

Peantoo t1_iwd3y3l wrote on November 14, 2022 at 7:25 PM

Nice, I was just researching this topic. Does this touch on CI/CD and other late stage deployment and testing issues?

sgt102 OP t1_iwdedyf wrote on November 14, 2022 at 8:33 PM

Great question; very thought provoking!

I don't go through to in life CI/CD scenarios in the book, but I do look at running MAB's and A/B testing to understand the relative performance of models in live, and also write about the need for model monitoring and governance supporting the prod deployment.

Basically the book mostly ends with getting it into prod - but with the emphasis on getting it into prod with the right framework around it that it can be kept alive in prod.

CaptMartelo t1_iwdkbco wrote on November 14, 2022 at 9:11 PM

How the hell is it thought provoking. It's a yes or no question

sgt102 OP t1_iwdwc7x wrote on November 14, 2022 at 10:33 PM

Because it made me think about whether I should have extended the scope into the operational phases of a machine learning system?

So I found it thought provoking...

globalminima t1_iwe6aeh wrote on November 14, 2022 at 11:45 PM

The problem with most guides and even ML frameworks (e.g. MLFlow) is that they do everything pretty well up to deployment, and then offer only very basic options that are not really fit-for-purpose for intermediate or advanced systems. It's definitely the biggest differentiator between the best resources and everything else

[deleted] t1_iwdu8wn wrote on November 14, 2022 at 10:18 PM

[deleted]

RegularUser003 t1_iwdzu1d wrote on November 14, 2022 at 10:58 PM

It feels a little light compared to Googles ML production readiness guide:

https://research.google/pubs/pub46555/

RaggedBulleit t1_iwec8gp wrote on November 15, 2022 at 12:30 AM

And Andrew Ng's MLOps Coursera

champagnebaths t1_iwq6hxp wrote on November 17, 2022 at 3:06 PM

Did you go through that one? In case what time did it take?

maybe_yeah t1_iwdu5wi wrote on November 14, 2022 at 10:18 PM

> The book is laid out as a series of fictionalized in sprints that take you from pre-project requirements and proposal development all the way to deployment. You’ll discover battle-tested techniques for ensuring you have the appropriate data infrastructure, coordinating ML experiments, and measuring model performance. With this book as your guide, you’ll know how to bring a project to a successful conclusion, and how to use your lessons learned for future projects.

1 INTRODUCTION: DELIVERING MACHINE LEARNING PROJECTS IS HARD, LET’S DO IT BETTER

2 PRE-PROJECT: FROM OPPORTUNITY TO REQUIREMENTS

3 PRE-PROJECT: FROM REQUIREMENTS TO A PROPOSAL

4 SPRINT ZERO: GETTING STARTED

5 SPRINT 1: DIVING INTO THE PROBLEM

6 SPRINT 1: EDA, ETHICS, BASELINE EVALUATION

7 SPRINT 2: MAKING USEFUL MODELS WITH ML

8 SPRINT 2: TESTING AND SELECTION

9 SPRINT 3: SYSTEM BUILDING AND PRODUCTION

10 POST PROJECT (SPRINT Ω)

Who is the target audience for this book? The description doesn't mention patterns and the online chapter view doesn't seem to have code samples

sgt102 OP t1_iwdwyr5 wrote on November 14, 2022 at 10:38 PM

The target audience is people who are being asked to lead an ML project for the first time - or who aspire to do so. The book doesn't try to teach the implementation details of modelling - mostly because there are many texts that do that very well already, far better than I could. So there are no code examples.

globalminima t1_iwe6j08 wrote on November 14, 2022 at 11:47 PM

There is no mention of monitoring, maintenance or retraining - does chapter 9 go into this? This is a big blind-spot if it's not there (and where most of the problems happen for inexperienced ML engineers)

sgt102 OP t1_iwhfc9e wrote on November 15, 2022 at 5:43 PM

Chapter 9 addresses (to some extent) logging and monitoring, and goverance - which is a lot to do with how the model should be managed in life....

I've worked in projects where the model was ungoverned and went wrong and no one noticed for a long time... and that caused damage. I also got called in to sort out a project where the team retrained the model every week... and every week they overfitted it on new data. I think knowing what the models should do, being able to say that they are doing that and then having a clear way of deciding what to do if they aren't (ie. someone in charge) is the base of maintaining them... what's your pov though?

VinnyVeritas t1_iwelmj4 wrote on November 15, 2022 at 1:41 AM

Looks like a lot of sprinting...

BATTLECATHOTS t1_iwf8ics wrote on November 15, 2022 at 4:38 AM

CRISP-DM

sgt102 OP t1_iwhfqco wrote on November 15, 2022 at 5:46 PM

Party like 1989... I think things have changed - my teams need data pipelines and reproducible test results; and they're doing things like evaluating performance using MABs... CRISP doesn't help so much with that... Also actually building a system, not only extracting a model from a table...

Do you see CRISP as sufficient now?

BATTLECATHOTS t1_iwhpt2a wrote on November 15, 2022 at 6:50 PM

On the business side of things yes. Maybe not such much in overall ML ops.

SignificantHall4684 t1_iwgp0ah wrote on November 15, 2022 at 2:47 PM

I am just in the middle of an interview process that would hopefully lead me from a PM role in automotive development to a new one as a PM in ML projects. I therefore seem to be your target audience. As I currently know just a little and still have a few days before the next interview round, I am going to check the book. Let me know if you wanted some feedback afterwards.

sgt102 OP t1_iwhdclu wrote on November 15, 2022 at 5:31 PM

Yes please!

91o291o t1_iwfzg03 wrote on November 15, 2022 at 10:27 AM

I need a book titled "Deploy pytorch in wild and in the rain. NVIDIA Jetson with webcam edition (or equivalent)." Any suggestion??