Agile/scrum/waterfall etc', was there something you tried and didn't work? How adjustments that aren't just time extensions to these known methodologies?

Im just wondering what other teams do that work, since my team is still trying things out, with a lot of convincing needed for managers/pm who are more pure software oriented.

I've found a few references online on how algorithm/ML/datascience development don't fit nicely into agile cycles, but i ended up with more questions.

Comments

You must log in or register to comment.

PredictorX1 t1_iztv3pj wrote on December 11, 2022 at 8:28 PM

I've never been at a workplace which used any of the structures you mention. Honestly, model development is fairly straightforward from the project management and software development perspectives. The clever bit is the statistics/machine learning, and the parts requiring the most care are data acquisition (problem definition, statistical sampling, ...), model validation (error resampling, testing for important sub-populations, ...) and deployment (verifying the deployed model, ...). Most serious analysts I know use something that resembles CRISP.

acardosoj t1_j080weh wrote on December 14, 2022 at 7:05 PM

CRISP is not a project management methodology, but more like a process. You would still need a project management methodology to manage resources.

We usually apply CRISP-DM (ML) within an agile framework.

PredictorX1 t1_j082k9y wrote on December 14, 2022 at 7:16 PM

>CRISP is not a project management methodology...

That was my point: Data science work needs a technical procedure, not project management.

acardosoj t1_j08ap5c wrote on December 14, 2022 at 8:06 PM

If you are working on a data science project, you would inevitably have project management activities in place. You need to report progress, need to manage costs, resources, schedule. You can do those in an ad hoc way without structure. But I guess that would lead to problems.

Imagine being asked for costs and progress estimates by a C-level. You would only be able to answer her if you keep track of these things. That's project management!

PredictorX1 t1_j08cat9 wrote on December 14, 2022 at 8:16 PM

In my experience, data science features costs which are relatively stable, and whose payment is committed to on an ongoing basis as a necessary part of the business by management. The only time costs would come into question is when more people were to be hired, on a permanent basis. Tracking the activity itself is handled by a manager of a small team, who periodically presents results to upper management. The only real "project management" I see is done in small teams when management assigns tasks and deploys or reports results to external entities. Tracking of progress is, again, in my experience, a light activity. I just don't perceive the need for excessive formality in the management of data science.

Hyper1on t1_j0a80ym wrote on December 15, 2022 at 4:20 AM

Usually you just make some estimates of projected costs, resource use, and timelines at the start of the project (aiming to be an overestimate), and if you are up to date with the progress made it's trivial to just correct these estimates if someone asks you for them.

RuairiSpain t1_izunqwy wrote on December 11, 2022 at 11:46 PM

Data Science is different from development, the agile methodologies don't apply because it's more split into 2-3-4 stages:

a discovery stage, where you hone in on the question you want to ask, where you get sample data from that simplified the actual data you'll work on
Which ML algorithm or strategy is close to answering your question with the sample data you have
Setup training, validation and test samples. Then validation that they are representative of you real data
Run the model and iterate over the process to improve results. Maybe use hyper parameters optimisation to come up with best results for your lose function.
Present your result for peer review
Refactor your model for performance and deployment

There is a lot of data science preamble before you get to a peer review. So quick feedback loops are different compared to software development. The discovery phase is more about understanding the data and extracting the appropriate features that should be tested. It's mostly about applying stats to your data, that then gives you hints about which ML modeling to choose from. See this article on stats: https://towardsdatascience.com/10-machine-learning-methods-that-every-data-scientist-should-know-3cc96e0eeee9

The developer stage is more at the tail end where you look at refactoring the algorithm to make it as fast and explainable as possible. Maybe also add a feedback loops in production to check for model drift, that's where your agile frameworks would potentially be used.

AmalgamDragon t1_izyv7wv wrote on December 12, 2022 at 9:24 PM

Kanban. Trying to do scrum for ML ends up pretty goofy as most backlog items will be spikes.

hadrielle t1_j020q1o wrote on December 13, 2022 at 2:42 PM

I'm a TL of a fairly small ML team. There's no perfect solution but I can try to explain a lil bit what we do.

We have 2 separate processes, what it's been described as product discovery so the (R in R&D) and the productionizing (the D in R&D).

For the research part, in scrum it's called spikes. Honestly, we use kanban (so we don't commit to sprints) although we have periodic checkpoints on how the research is going.

Honestly before starting the research, sitting down with the stakeholders and understanding whats we are trying to solve is key. I would not suggest starting a project with a very vague description on what you might want to solve, this typically leads an horror show based on "he said she said we understood". But let's say you have a "well defined" idea of what value you want to create. Then you move into kanban (create the research ticket bla bla bla). In my company we document everything, the ideas that we had, what worked what didn't, metrics, error analysis, anything that you might need in the end to make a decision on which specific solution you might need. This part is what "most people" assume ML in companies who have never done ML in a company think ML it's about. Basically you might have a notebook, a shitty poorly optimized script, perhaps if you have some standards you'll have a dockerized demo. In any case, as per the methodology goes, kanban is flexible enough to fit anything you want in research.

Now we move into let's write real code that scales. This is software development. You got a model, but you have to serve it, might be real-time or batched or a combination, probably you got a DB that stores results or an API whatever. That's pretty standard, and depending on the company might fit into more kanban or sprints. What we do is fit everything into a different kanban.

At the end of the day, it's a matter of checking what works for you and your company. Even if you were just writing pure software, not a single company does the same scrum, everyone adapts it to their needs and people. In my experience, having short research cycles, perhaps a week or two, evaluate the progress, and decide what to keep doing fits pretty much into the idea of a sprint, but typically in sprints you want to deliver some value, which will not be the case for most sprints, hence we transitioned to kanban.

On how to convince PMs and managers, well, I got no clue. IMO you can't be a PM for a ML team without actual ML knowledge. There's no way you can come up with solutions or ideas if you don't know the potential that ML has.

Happy to discuss tho :D

bankimu t1_izuj8my wrote on December 11, 2022 at 11:13 PM

What are those things.

Scrum sounds like scum and scram.

benopal64 t1_j030b06 wrote on December 13, 2022 at 6:42 PM

Lol. If you're interested in basically any tech industry I would highly recommend doing a bit of research into the Scrum and Agile. They are basically frameworks or rather rough guidelines for team development strategies.