Don’t Forget MLOps When Deploying AI/ML Models

When performance degradation in your AI model isn’t an option, turn to MLOps.

RS21

Published in

RS21 Blog

9 min readMay 12, 2021

By Michelle Archuleta, PhD., RS21 Director of Data Science

Have you ever ridden a bike without brakes, or tried to ride with a flat? The best-case scenario is that you don’t go very far. Worse yet, it can hurt badly. I use this model performance analogy to describe the consequences of deploying artificial intelligence (AI) or machine learning (ML) models without MLOps. When models don’t perform optimally, it presents big challenges.

MLOps is the standardization and streamlining of the machine learning life cycle management, as defined by Mark Treveil in the book Introducing MLOps. It allows businesses to provide long-term value for AI models deployed in the wild while also reducing risk. In practice, it requires committed cross-disciplinary teams and executive management buy-in. And it encompasses responsible AI, as it requires you to know the limitations of any given model and put in place systems that address those limitations through a transparent process.

What MLOps is not? It is not AIOps, which refers to leveraging AI for solving DevOps problems. It shouldn’t be confused with ModelOps either, which incorporates all models rather than focusing on AI/ML challenges. MLOps is also not specific to individual contributors or a single division but is rather a team sport that is going to rely on many contributors and areas of expertise.

There’s a real need for MLOps to increase reliability of AI/ML models. Currently, 90% of machine learning models are not deployed, and that’s despite over $70B in global investments made to AI in 2019 (Raj E., Engineering MLOps). Despite relatively few models making it to production, the demand for AI/ML is not slowing down.

It’s projected that in 2025, there will be four times the amount of data that we have today.
Post-2012 compute has been doubling every 3.4 months.
There has been a 300% increase in the volume of peer reviewed AI papers.
And finally, 58% of large companies have reported adopting AI in at least one function or business unit.

Based on these statistics, it appears most of our models are not deployed to production where they can deliver true customer value. With MLOps, however, we can greatly increase our model success and reduce performance degradation.

Why MLOps?

The problem MLOps solves is really the intrinsic differences between machine learning and traditional programming. In traditional programming, you have an input, and you write a software program that’s going to perform some type of computation to yield final results. This is a very repeatable, robust process. In machine learning, however, it’s slightly different.

In machine learning, you begin with an input and a desired result (e.g., this could be your training label), and you’re really optimizing and training the model to produce your desired results. At the end, that creates a program, or your machine learning model.

Now the issue is that the machine learning model is dependent on the input data set. If the type of data the model encounters in the wild is much different than the data set it was trained on, then we see model performance degradation. This is known as data drift: a change in the incoming data stream.

A significant example of data drift is related to COVID-19. Consider all the different data sets and models that suffered because they were not able to predict the pandemic. This impacted practically every industry, and any model that was trained before COVID-19 was not able to predict or respond to the changes in the underlying data drift that occurred during the pandemic. We can also think of the COVID-19 variants as an example of data drift, with medical devices for diagnosing COVID-19 experiencing model degradation due to unfamiliar data.

Similarly, concept drift affects model performance. Concept drift means the statistical properties of the target variable have changed over time, and most likely in an unforeseeable way. Think of a spam detector and how the definition of spam email has changed over time.

When data and concept drift occur, models are more likely to make false classifications. In some instances the consequences are insignificant. If an ad or movie recommendation is off due to model degradation, the impact is minimal.

When we start looking at models in the financial, medical, security, or robotic sector, however, consequences can be severe. We can’t afford for the models used in autonomous driving cars, for example, to degrade over time from data drift and concept drift. Instead, we need to rollout an MLOps solution that will retrain models, address data and concept drift, and maintain performance and reliability.

As we can see, MLOps is critical to AI/ML model performance and the business or customer impact. We already see MLOps being used across industries and embraced by companies like Blue Cross Blue Shield, Deloitte, Humana, Kroger, and Panasonic.

In addition to increased adoption, regulations are also driving MLOps. In health care, the Food and Drug Association (FDA) has established the Good Clinical Practice regulations and guidelines around traceability, the ability to recreate the development history of a drug or medical device, accountability, and data integrity. In January 2021, the FDA also put forth a new Action Plan that describes a multi-pronged approach to the agency’s oversight of AI/ML-based medical software.

In the financial sector, the UK Prudential Regulation Authority has defined Model Risk Management (MRM) principles that define and record models, establish a governance framework, policies, procedures and controls, and undertakes appropriate model validation and independent review.

Maturity Levels of MLOps

Getting a full MLOps system in place is by no means trivial. It takes a lot of work and collaboration from different divisions within an organization, and building expertise and rolling out a full-fledged MLOps system is a gradual effort. Processes might start out as manual, and gradually move into some automated deployment, and then finally a full MLOps framework Here, I’ve defined maturity levels to help define what we’d expect to see in each stage of growth.

Level 4 indicates a full MLOps solution where processes are automated, a robust CI/CD pipeline is in place, retraining is triggered automatically, data drift and concept drift algorithms are deployed, and performance metrics are monitored and collected. At this stage, we finally get to the category of lower risk for high impact projects.

So what type of high impact project will require a full MLOps solution?

Let’s talk space.

MLOps Use Case in Space

Our data science team recently took part in the Hyperspace Challenge, a business accelerator fueled by the U.S. Space Force and Air Force Research Lab to support innovative solutions for the challenges that missions face in space. We were the winners of the Hyperspace Challenge, and we came up with a novel approach for leveraging neural networks for predicting fault detection.

This is a definite use case for MLOps, because satellite performance is critical for our customers providing warning navigation, research and development, national security services, and weather forecasts to the highest ranks of the national government.

Our neural networks for fault prediction solution focused on identifying the subtle signals that can serve as early warning signs and alert operators to potential issues so they can intervene and mitigate problems before failures occur. This is especially important because, unlike a computer that you can easily restart after it freezes, restarting an expensive satellite in orbit while it’s exposed to harsh environments disrupts missions and is extremely costly.

The solution was derived from David Dooling’s work in healthcare. David is a Senior Data Scientist at RS21, and he looked into applying survival analysis and oncology models that are used to predict the outcome or the prognosis for a patient to this satellite problem. It was a very appealing solution, because unlike in health care where you are collecting few data points for individual patients, with satellite data, you could be receiving data continuously from 150 million different sensors.

If we think about the heterogeneity of these satellites, we’re not talking about a single machine learning model for the entire fleet, but several models that must be customized and trained for each of the individual satellites. Now, we take into account MLOps and how to feed all that data through an MLOps system.

We have our model deployment with our robust CI/CD pipelines and development environment
We’re deploying algorithms to detect data drift and concept drift in models in the wild
We’re monitoring system performance through feedback loops in terms of our results
And then when a trigger indicates that a model needs to be retrained, we can deploy our challenger model against the model in production

This is all automated in a full MLOps framework.

MLOps in Practice

MLOps is a team effort. It requires buy-in from your executive leadership that might be used to software deployments and less familiar with the nuances and risks of AI/ML related projects. And it takes governance. Your MLOps team needs to be diverse in skillset and expertise so you can consider a holistic picture of the framework.

Your MLOps team should establish an intention and goals from the beginning, identify responsibilities, define a RACI chart, structure the process, define the appropriate metrics to help monitor for change, and build multiple layers of checks into the MLOps pipeline. Finally, it will require ongoing education for the organization and customers to help decision makers and builders understand how to mitigate risk.

MLOps aligns with our values of using data and AI for good as it is a key pillar for Responsible AI. MLOps not only ensures reliable model performance for business value, but it also allows us to reduce or eliminate bias, provide transparency and take models out of the black box, and establish accountability.

For additional information on MLOps, I highly recommend Engineering MLOps by Emmanuel Raj.

I’d also invite you to view a recording of my presentation from Data Summit Connect that provides a more in-depth review of the space and satellite case study and maturity levels of MLOps.

View a recording of Dr. Archuleta’s presentation from Data Summit Connect.

I would love to hear from you. How have you deployed MLOps, and where are you in the process? Drop a comment, or connect with me on LinkedIn.

Michelle Archuleta is the Director of Data Science at RS21. A visionary leader in artificial intelligence, Michelle has extensive experience developing foundational AI technology into products, especially for the healthcare industry. She has 6 pending patents and has published in top reviewed scientific journals with a focus in the fields of systems biology, computational biology, and utilizing applied mathematics and machine learning. Michelle specializes in deep learning, reinforcement learning, analysis of claims data, and healthcare analytics. Read an interview with Michelle.

ABOUT RS21

RS21 is a rapidly growing, global data science company that uses artificial intelligence, design, and modern software development methods to empower organizations to make data-driven decisions that positively impact the world. Our innovative solutions are insightful, intuitive, inspiring, and intellectually honest.