Agile and Data Science — A Match Made in Heaven?

Published in

Just Eat Takeaway-tech

7 min readDec 12, 2023

In the last 15 years, the discipline of data science has grown significantly with data science teams becoming an integral part of every tech organisation. However, despite the value that data science teams bring to businesses, there is no universally agreed approach for managing their work.

Some data science teams use Scrum but soon realise that it doesn’t fit to the nature of their work. Others use Kanban which doesn’t work well either as it lacks structure, there is no timebox, and can introduce scope creep. Some others mix and match Waterfall with Agile but this only creates more confusion.

In addition, practices that work well for software teams, which usually release code many times per day, may not work as well for Data Science teams which may be running an experiment for a month.

Overall, there is not one universal approach to manage data science projects. I believe that the focus instead should be on the mindset, principles, and values we follow that best fit to the nature of data science work.

Agile Mindset and Data Science

Many of the challenges that Data Science teams face are complex adaptive problems and hence fall into the complex domain. This means that at the outset the end solution is unknown while through experimentation we may discover that our initial hypothesis is wrong and needs to change. This may lead to a new hypothesis and more experimentation. In the complex domain there are no best practices to follow but only emergent and adaptive solutions.

Agile promotes empiricism to help solve complex adaptive problems, like the ones that data science teams are facing. As such, Agile is a perfect fit for managing data science projects compared to traditional project management approaches which are best fit to problems with a known scope and solution.

Similarly, the Lean Startup approach, popularised by Eric Ries, talks about starting with a hypothesis, building a Minimum Viable Product (MVP), or in the case of Data Science a Minimum Viable Prototype, and then running experiments to validate the hypothesis through a Build — Measure — Learn loop. If the hypothesis is not validated then the team learns something useful from the data (validated learning) and they pivot by creating a new hypothesis based on what they’ve learned.

Build-Measure-Learn Loop (Adaptation from the Lean Startup)

What we learned from our Agile journey

When I first joined the logistics data science team as a senior agile delivery manager the team was doing a strange mix of waterfall, Scrum and Kanban practices. From the beginning it was obvious that these practices were not working well together, while the team was busy with too many projects and had no time to reflect and improve. Although the team was doing ‘Agile’, they were not really being agile.

This was significantly impacting their performance, effectiveness and happiness. They were overstretched, projects were taking too long, and stakeholders were not happy. They knew that it was not a sustainable situation.

With the above in mind the team decided to follow a purely agile approach and remove all legacy waterfall project management practices, which were not suitable for the complex adaptive problems they were facing.

1. Start with Scrum

Initially we decided to implement Scrum without mixing it with any other practices so that we get the most out of it. Scrum soon helped make painfully obvious what was not working. The team was overcommitting to too many goals while many of those goals were individual goals rather than shared across the team. They were operating more like a group of individuals rather than a team.

Scrum helped create a more structured team approach with a focus on shared goals. It also introduced feedback loops (reviews and retros) that enabled the team to reflect regularly, interact more with each other and continuously improve. They started being more focused, more collaborative and sharing knowledge frequently with each other.

At the same time, the team moved away from big waterfall plans and started planning in iterations which allowed them to find a delivery rhythm. A few months later, the results were speaking for themselves.

The team’s velocity more than doubled. Their cycle time was reduced by half, the team was feeling happier and had more interaction with more knowledge sharing.

But Scrum was only the beginning.

2. Map your E2E workflow and identify your bottlenecks

The next step was to define and visualise our end-to-end workflow. Up to then everyone had a different perspective on what our workflow was.

The team’s workflow was loosely based on the CRISP-DM model(CRoss Industry Standard Process for Data Mining), according to which the typical lifecycle of a data science project comprises of 6 iterative steps. Any step can lead back to a previous one.

Business understanding
Data understanding
Data preparation
Modeling
Evaluation
Deployment

Although the team used CRISP-DM they also added some additional steps which reflected their own context.

As you can see from the diagram below our end-to-end workflow comprises of a number of interconnected loops.

Data Science Lifecycle (Adapted from CRISP-DM)

Exploration loop: In the first few iterations the hypothesis is explored through business and data understanding.

Build MVP loop: Once there is proof of concept the team builds the Minimum Viable Prototype.

Evaluation of MVP loop: The team examines the accuracy of the MVP model through a series of training and testing iterations.

Integration Testing loop: When the MVP is validated then the team moves to integration testing.

Deployment of MVP loop: Following successful rounds of integration testing the team deploys in shadow mode and starts the A/A and A/B testing to see if everything is as expected.

Finally the team deploys and there is customer acceptance. Of course, even after deployment there might be a need for adaptation and evolution based on real-world data.

Mapping our end-to-end workflow created a shared understanding amongst the team members, and helped us identify bottlenecks that we needed to improve. It also helped us reduce the time we were spending on each phase and prevented us from spending too much time over-optimizing solutions.

3. Adapt your Agile Practices and Keep Evolving

As the months went by we realised that although Scrum was helping the team, at the same time it was creating an artificial constraint (2-week timebox) that wasn’t really aligned with the duration of our experiments and business cadence. Once again, it was time to change.

At this point, we agreed to experiment with a new framework, called Structured Kanban Iterations (SKI). The SKI framework is based on Kanban Principles but has more defined roles and structure. Instead of trying to commit to work that fits to a timebox, the team plans in iterations which are driven for instance by how long it is required to run an experiment (capability-based iterations). However, we didn’t adopt the whole framework but rather decided to borrow some good ideas from it.

We replaced Sprint Planning Sessions with Backlog Item Selection meetings which helped plan for many weeks ahead (not just 2 weeks) based on the work and business deadlines. This provided a better way to the team to manage their work dynamically without relying on artificial timeboxes.

At the same time we kept both the Iteration Review and Team Retrospective on a 2-week cadence in order to act as progress checkpoints. The reason for this was practical as we had multiple projects at different cadence.

In addition, we decided to stop using story points as it didn’t add any value to what we were doing while we replaced velocity and burndown curves with flow metrics, such as lead time, cycle time, and throughput, to measure our delivery performance and discover areas for improvement.

And the journey continues

As the data science team grew bigger in size we decided to split the team into 4 smaller teams and align them closer to specific business value streams. This has brought new challenges but one thing remained the same, our drive to experiment with new and better ways of working.

A team is like a living organism, it never stops evolving. It is continuously progressing from one state to the other as the business environment changes. What matters is not finding some “ideal framework” which will magically make a data science team effective but the team’s ability to transcend frameworks and practices and discover what works best for them.

Just Eat Takeaway.com is hiring! Want to come work with us? Apply today.