Agile Machine Learning Product Development

A guide to faster ML product development with Agile

Published in

Slalom Build

10 min readDec 8, 2022

This article will focus on the real-world applications of Machine Learning in the context of Agile product development. This is meant to be a guide for how to implement Agile for ML products and iterate faster towards creating exceptional customer outcomes. Some of the examples used come from my experience in the telecom industry, where I have worked towards growing the data science capability in one of the largest telecom companies in the world. Since all companies are different and their customers and use cases are unique, this is meant to serve as a blueprint for Agile ML development, rather than an authoritative view.

Before we begin, a few pointers on how certain terms will be used throughout:

Data Science team: An inter-disciplinary team with the aim of delivering a product.

Machine Learning: The process of creating production-level models, with automated tests and error-handling optimised and documented. More info on what a production environment is can be found in this article from PagerDuty

MLOps: The methodology for supporting infrastructure, pipelines, and other tools and platforms to deliver Machine Learning outcomes.

Scrum: A methodology that exists to deal with complexity and a lack of visibility into all the possible future outcomes. It seeks to provide a way of dealing with complex problems which are also present in Machine Learning development. For example, creating a solution to prevent churn or a recommender system to predict which devices to recommend to customers.

Sprint: A short, time-boxed period during which the work happens, usually two weeks

The product

There are different interpretations for what the goal of a data science team is. However, the product should always be the customer outcome. You may define it as the implementation of a Machine Learning model, but never the model itself. The customer need drives what the product will be. The product is not defined solely by the data scientist or the product owner, but rather it is an idea that materializes through multiple iterations. In software development, the product is not the software, or the feature added to the software. In data science teams, the product is a customer experience supported by the application of data through data science methods. The outcome in product development is reached according to the definition of done that the team agrees on.

Balance matters

One of the common impediments to scaling up efficiently is not having the right balance of skills within the team. For example, if you have too few data engineers you can’t create an actionable data asset for data scientists to work with. A short-term solution could be cross-training data scientists to self-serve and use Data Engineering tools, however a mature data science product team will have all the needed skills represented.

Machine Learning doesn’t exist in a vacuum, it’s usually part of a complex delivery process supported by many different technologies and people. The product teams usually involve an executive Sponsor (e.g., a Head of Marketing), data scientists, ML engineers, Data Engineers, product owner, DevOps or Platform Engineers, Software Engineers, and others, depending on the use case.

If a product team becomes too large, it should be broken down further under one product owner and focus on features. For example, ML product teams at the telecom giant rarely exceed more than 7 people and usually contain just one data scientist and one Data Engineer. A feature could be anything from the introduction of a new, better model version to a new way of displaying recommendations on a website. The product owner should focus on the bigger picture, not the features. Meanwhile, the team should focus on sprint goals related to features, refining and reiterating as they go along. In practice, this could mean creating a baseline model with 70% accuracy for churn, implementing it to reduce churn in the short-term and then working on improving the model accuracy with every iteration. This can be done by adding new data points or, for example, experimenting with model architecture.

All models are wrong, but some are useful

Remember that agile may not be always the best way to do things. For example, a self-driven car experiment may not be the best place to release features that lead to risky customer outcomes, like a new vision system version without extensive testing. If speed of delivery needs to be weighed against the risks delivering the model prematurely, you can use more traditional cost/benefit analysis to inform your approach. Regardless, a Data Science team, like a traditional software delivery team, should focus on outcomes, whether that is creating a data pipeline, adding a feature to an existing ML model or creating a new model.

In practice, the model may not be deployed to production even if it’s very accurate. At the telecom giant, I’ve seen models not implemented for a variety of reasons: lack of buy-in from the business, lack of understanding of what the model does or aims to predict, cost or complexity of implementation, or timing. Don’t worry if this happens, keep iterating. You can always revisit the model at a later time when the implementation makes more sense.

Human after all

Since we’ve know that ML doesn’t exist in a vacuum, there will always be dependencies on other people, either internal or external, that can hinder your progress in achieving sprint goals. Companies like Amazon have tried to reduce the friction and reliance on human conversations by using APIs to get answers across the business. But this is not always possible—at least at first—for a maturing company or new implementations. Data science teams will still have to rely on good old fashioned human conversations.

An important skill in Agile teams is communication, especially communication with people from different backgrounds. For instance, a technical wiki needs to be used and understood by both technical and non-technical stake holders. A data science team needs to align everyone’s objective towards achieving the sprint goals and thus towards improving the customer experience.

Unlike in academia, the goal of ML for product development is to create the product, not to create the model. There is no value in a model that never reaches production, nor for a model that doesn’t improve the customer experience. If you are unsure how feasible something is (for example creating a new model architecture) create a time-limited spike where a new feature is abandoned or delayed if not feasible within a time limit.

One liberty that we could initially take within the data science team was the time taken on model development. A lot of data scientists, including myself, want to deliver the best model to the world—to optimise on a metric like accuracy—then observe our model deliver the expected outcomes in the wild. However this is not the case in the world of product development. Cost/benefit analysis dictates that it’s better to deliver a baseline model than no model at all. It’s generally better to deliver a model that predicts churn accurately 70% of the time, while you continue fine-tuning iteratively to increase the accuracy of the model.

However, there are cases where a baseline model is not desirable. For example, a 2019 study* discovered that facial recognition models are far less accurate at identifying faces from different ethnic groups. This can be problematic in applications deployed by public institutions, such as with police. Consequently, some cities have banned the use of such technology, however I believe that the answer lies somewhere in the middle with the need for much closer supervision in the form of human validation until they mature. The risks of deploying a technology needs to be balanced against the benefits, and be considered on a case by case basis.

Show me the money

That’s the theory. Let’s talk practical applications for the above by using a real-world example from my time in telecom. It had become clear that the recommendations we were serving on our website were not appropriate and did not create a satisfactory customer experience. We had a WHY, but before moving into the HOW, certain buy-in was needed in order to progress. Our main sponsors came from the upgrade & cross-sell teams, whose interest was in improving sales. Our intuition said that a recommender system would be able to produce better recommendations than the current rules-based ones on the website, and that we could test this hypothesis by allocating a small percentage of the traffic to the new system. We sold this as a proof of concept to the sponsors, which necessitated convening a team to deliver the product.

Remember, the product was the customer experience of receiving better recommendations, not the recommender system itself, so we needed a multi-disciplinary team comprised of Data Scientists, Data Engineers, front-end developers and AWS SMEs. Our first dependency also became clear as we needed to get streaming data found in Kafka clusters on-prem to AWS Kinesis in order to feed the recommender system. With all this knowledge in place we could define stories and start planning sprints.

Reference architecture for the recommender model implementation

Through our first sprint we aimed to unblock the data dependency and thus our Data Engineers had to work alongside the on-prem tech teams to enable the data from the Kafka clusters to feed through to AWS via a connector. Once the data reached AWS, Glue pipelines needed to be put in place to enable near real-time Extract Transform Load (ETL) from the Kinesis streams. Once again, Data Engineers were instrumental to enable the pipeline functionality and the basic transformations needed. In the meantime, the data scientists on the team were exploring the docs for AWS Personalize, an out-of-the-box solution that would enable a speedy delivery of a prototype with minimal configuration. The website devs were busy with understanding what API responses are needed to integrate with the website, for example customer ID and product IDs for the recommended products to be displayed on the website, alongside marrying the product IDs with images of the specific product from another database. An example of how our board most likely looked at one point in time can be found below.

Example of a board with some stories we could have been working on at the time

Some of the initial challenges with data availability meant that the data scientists did not have any data to train the model on. While Data and Platform Engineers worked towards bringing the data into AWS, the Data Scientists prototyped a baseline model with dummy data that they have created themselves to replicate the format of the data that would be available to them in the next sprint. Thus seeking to remove blockers as fast as possible and enabling others is very important in Agile. Another challenge was having too many choices for how to display the recommendations on the website. In Agile, decision paralysis can be removed by establishing what good looks like at the time and then revisiting at a later date.

After many iterations and the challenges were overcome, the POC transformed into an MVP which made it to production, with a subset of customers receiving the new personalized recommendations. Customer interactions were summarized into a report, tracking performance indicators like number of clicks and conversion rate. These were benchmarked against customer interactions on the page without the recommender engine. Overall, the experiment proved to be a success and the project went on to be implemented at a larger scale with a feedback loop feature that auto-retrains the model and chooses the best model according to specific metrics.

Building Futures. Daily.

I hope that this blueprint for delivering exceptional customer outcomes can help you. Each business is unique, so what worked for us at the time may not work for you. At Slalom Build we have worked with clients in diverse industries in use cases spanning MLOps, Platform Engineering, Data Engineering, Software Engineering, cloud adoption, and more, so drop us a note if you’d like to know more about working together.

https://www.slalombuild.com/en-gb

Here are some useful links for digging deeper into MLOps and other :

MLOps: CI/CD pipelines, source control, well-defined PR review processes, enabling Data Scientist experiments in containerised environments.

https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning

Platform: Re-usable infrastructure & architecture to support different use cases. Compute, data storage and APIs are the key building blocks for the majority of ML models.

https://medium.com/slalom-build/why-a-platform-engineer-should-lead-your-devops-transformation-4737c41ec10b

Data Engineering: the guides below have re-usable ETL pipeline blueprints, datamarts and sample workflows

A Beginner’s Guide to Data Engineering — Part I

Data Engineering: The Close Cousin of Data Science

medium.com

A Beginner’s Guide to Data Engineering — Part II

Data Modeling, Data Partitioning, Airflow, and ETL Best Practices

medium.com

Machine Learning: a diverse model portfolio and knowledge of different types of models for different use cases (ie: when to use XGBoost versus a neural net). Automated hyperparameter tuning scripts and model creation blueprints. Many model architectures can be re-used for different use cases, ie: classifying churn models blueprints can be re-used for classifying upgrade customers, it’s just the data and target that will change.

A great cheat sheet that does justice to the broad spectrum of ML:

Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data

The Most Complete List of Best AI Cheat Sheets

becominghuman.ai

*2019 study covered by the BBC — https://www.bbc.com/news/technology-50865437

Thanks to Yasneen Ashroff, Michael Pilosov and Chuck Snavely for their invaluable help with editing this article.