MLOps Part 1: Assessing Machine Learning Maturity

Published in

Slalom Data & AI

6 min readMay 11, 2020

There are several articles and surveys which look at the success of data science and machine learning (ML) projects. A 2019 story by VentureBeat claims that only 13% of data science projects make it into production. Research by the International Data Corporation (IDC) states that a quarter of organisations report up to a 50% failure rate for AI projects. A quick Google search will find many more statistics like this (and the exact numbers will vary somewhat) but the key takeaway is that while the AI/ML market is showing robust growth, many projects fail and very few models become operational.

In this blog post, we will examine the challenges involved in operationalising machine learning projects and focus on the underlying technical obstacles that need to be addressed (based on Slalom’s experience delivering hundreds of ML related solutions into production and seeing the results). We’ll then look at how MLOps addresses these obstacles and what a mature MLOps capability looks like. In part 2, we’ll demonstrate how infrastructure-as-code can be leveraged to build a machine learning automation pipeline for a real-world use-case.

The Challenge

The reasons that machine learning projects fail are varied and a lot to do with culture, people, and organizational reasons as much as technical. For example, is there a healthy data culture within the organisation? and is there a strong data science operating model? We won’t cover those topics in this blog post but they are equally as important for ensuring success around machine learning. Here we’ll focus on the complexities of establishing a real-world ML system, how that can impact collaboration and the time to demonstrate value, and finally how technology and MLOps can help to address this.

There are many elements needed to build integrated machine learning systems and continuously operate them in production. Google does a great job of articulating this in their solution paper MLOps: Continuous delivery and automation pipelines in machine learning. The scale and complexity of the challenge can be seen in the diagram below.

Source: MLOps: Continuous delivery and automation pipelines in machine learning (Google)

This diagram helps to highlight three obstacles for operationalising ML and the technical root-causes:

Environment complexity: we can see that many elements exist outside of the model code itself. ML projects often fail as a result of not taking the time upfront to understand the environment in which a model will need to run
Time to develop: although it can be quick to create a good initial algorithm (with the right data), the time to continuously iterate on that model without some of the elements above (automation, serving infrastructure, process management) can lead to projects being slow to get off the ground or to not get beyond proof of concept
Collaboration: no one person can tackle all of the elements above and the skills required to build ML models, feature engineering pipelines and automation are quite different, requiring the need for multiple individuals to collaborate to operationalise a model. As a result of the different skill sets within these disciplines (and often organisational structure), data scientists, data engineers, and DevOps engineers often operate in silos and there is too much manual handoff between teams

We will return to the Google paper later in this blog but first, we define MLOps and how it looks to address the problems we face.

Defining MLOps

To meet the challenge set out requires applying automation and DevOps principles to ML (MLOps). This means combining best-practices around DevOps, ML, and data engineering with a goal to de-risk, accelerate, and embolden ML projects.

This concept isn’t new but the domain has a lot more focus recently as more and more organisations are facing challenges in getting their early attempts at ML off the ground and embedded into production systems.

This blog won’t go into detail about the wider MLOps landscape but Rob Sibo (Director, Slalom Sydney) wrote at length in a recent post about defining MLOps, tooling, and general best practices.

Measuring MLOps Maturity

Revisiting the Google paper, MLOps maturity can be measured in three phases which are necessary for implementing MLOps capability.

Phase 1: Manual Model Process

The first phase is manually building ML models which are fed into production through some type of prediction service. Although this gets our models beyond a pure research project, it doesn’t address the main obstacles we discussed earlier. This is a clear example of the challenge we outlined around collaboration, where Data Science and DevOps are operating in silos and there is manual handoff between training and iterating on models (usually in Jupyter Notebooks) and actually allowing predictions to be served from those models.

Phase 2: ML Pipeline Automation

Phase two involves creating an automated pipeline for (re)training, tuning, evaluating, and deploying models as well as data extraction/transformation, performance monitoring, and metadata management. This pipeline requires a means to trigger the training and deployment process continuously (as-needed) which can be either:

On-demand
Through a scheduler
Based on new training data becoming available
As a result of model decay

This level significantly helps to reduce time to develop by making the process of iterating on new model versions much easier. As a result, what originally would have taken days and weeks (with manual handoff) now happens in hours or minutes.

Phase 3: CI/CD Pipeline Automation

The final phase incorporates CI/CD and automation of building, testing, and deploying new pipeline components to a target environment. This provides the complete end-to-end pipeline with automated decision making in QA and Prod. This level builds on the last one but further addresses the issue of collaboration as data scientists can focus on experimentation and see the results in their development and test environments before promoting to production. Therefore, the time of Data Engineering and DevOps is freed up to further develop the ML system and deal with the environment complexity.

Climbing the Maturity Scale

Slalom has developed an approach for assessing maturity against this scale which includes a standardised rubric for judging the quality of our clients’ ML infrastructure. We have also developed AWS sample architecture and infrastructure-as-code which focuses on getting clients to level two in this maturity scale. Having a fully automated ML Pipeline is a critical step to getting value from machine learning. We see many clients who are at the level one manual stage and this is often where projects fail in the long-term as it is not conducive to collaboration and leads to longer development time which makes it difficult and time-consuming to manually iterate on the models in production as performance starts to deteriorate.

Up Next

In part 2 of this blog series, we’ll take a deeper dive into some of the technical aspects around MLOps and look at how AWS and infrastructure-as-code can be leveraged to build an automated machine learning pipeline. We’ll present a standard architecture leveraging different AWS services and show how it can be applied to a real-life use-case.

Jack Sandom is a Data Scientist out of Slalom’s London office. He specialises in machine learning and advanced analytics and is a certified AWS machine learning specialist. Speak with Jack and other Data & Analytics practitioners at Slalom by reaching out directly or learn more at slalom.com.

Slalom UK