How Does ‘Agile’ Work for Data Science Teams?

Kimberly Berls
Product Popcorn
Published in
4 min readMar 10, 2020

‘Process’ is always a hot topic in product management… and … can I tell you a secret? I’m struggling with how traditional SCRUM fits for my data science team.

Our team is comprised of data engineers and data scientists. Like most companies that have made it to the 21st century, our organization as a whole is Agile. (We generally use traditional SCRUM.)

However, I’m finding the mix of roles on our team makes it difficult to apply a ‘one size fits all’ process in which all team members are able to produce their best work while remaining unencumbered by process, and meetings around that process.

Do ML/DS teams require different process?

While a traditional software team produces functional code and releases it to the wide world at (2–3 week) sprint intervals, data scientists do machine learning, and their models typically produce insights as output.

Another problem? (Okay, two more problems.)

  • Many ML models take longer than a typical sprint to build — at least the types of models the team is building. Models can be difficult to break down into end-user ‘chunks’ that can be delivered in a 2-week sprint.
  • When building a new ML model, there’s often R&D involved, which means it’s hard to predict exactly what steps will be needed and how long each will take. I find there’s typically more uncertainty than with other types of software teams.

How do we reconcile being Agile while also making sure data scientists can do their job without being forced into a structure that doesn’t work for them?

Remember why Agile is important — a quick review

Everyone loves to talk about how important Agile is, why it’s so great, and to brag about how how they are Agile-ing better than you.

I SCRUM SO HARD, PEOPLE! I SaFE THE BEST! I KANBAN SOOOO GOOD!

Let’s all take a step back and remember why Agile happened in the first place.

When software teams starting making the change from waterfall methodologies to Agile in the late 90s, the goal was to release functional code at more regular intervals, so humans could test sooner, therefore catching unintended consequences before products were released to the world. The result was the ability to iterate faster, which resulted in better user experience for the end product. THAT’S IT.

  • Agile is important because it allows team to iterate faster, therefore ensuring you’re building the right product.
  • Agile is also important because it allows teams to ship functional code faster.
  • Agile is important because it brings structure to teams, without that structure being heavy and cumbersome.

If whatever Agile methodology your team is using, is not helping you ship product faster — you need to reevaluate what you’re doing.

Also, if your data scientists are pissed off and hate whatever process you’ve put in place, you should probably do something to fix that.

Common problems with SCRUM applied to data science teams

My team owns a product that is very machine learning centric, and is comprised of two main work streams:

  • Data engineering — Data engineering releases code like a typical software team and can easily deliver work within 2- or 3- week sprints without having to force it.
  • Data science / ML engineering — Producing new models and R&D both take much longer to produce functional output.

Data scientists and/or ML engineers typically need a longer lead time to produce functional output (especially for demos), while data engineers fit (more) easily into a typical Agile framework.

How to fix it. (Maybe.)

I don’t claim to have a perfect answer for how to fix this problem, but my team is trying a new methodology that’s working well so far.

Here’s the methodology I’m testing to reconcile a single product team that includes both data science/ML and data engineering functions occurring in tandem.

1 — Two-week sprints for data engineers, and four-week sprints for data scientists. This helps data scientists deliver demo-able functionality in a four-week timeframe, which makes more sense for the types of models we’re building. Data engineering can still deliver complete outputs in two-week time periods.

2 — Stand-up board! Data engineers and data scientists stay in sync without having to combine scrum ceremonies. Make sure your team gets in the habit of updating this as often as necessary.

3 — Communicate — consistently and often — that data science deliverables are insights and models. Depending on the types of models your team is building, ML products can take longer to deliver than an app, an API, and other more traditional software stuff. Socialize the fact that data science teams deliver insights, while traditional software teams deliver code.

4 — Don’t force SCRUM things that don’t work for data scientists. Keep user stories for data scientists more general (if that’s helpful for your particular team), and don’t force fully-fleshed-out acceptance criteria, or other standard scrum stuff that isn’t actually helping them build models.

5 — Timebox R&D, and run ‘tests’ that have a long lead time in parallel. There will be sunk costs, especially with experimental ML work, so prep your team to emotionally deal, and keep your eyes on the goals of the end-user.

This was originally published on ProductPopcorn.com, where I post all of my rants and raves about Product Management.

--

--