What are the six success factors for applying MLOps in your projects? Part 1/2.

Niels van den Berg
7 min readApr 4, 2023

This is the first blog of a series of 2 on six success factors for applying MLOps in your project. In this first blog, I define MLOps and MLOps platforms and introduce the first two success factors. In the second blog, I cover the last four success factors.

Introduction

Artificial Intelligence (AI) currently enjoys widespread popularity, with organizations exploring its applications across various domains, including chatbots and image recognition. AI is leveraged to enhance operational efficiency, reduce costs, and create new business opportunities. However, despite the enthusiasm surrounding AI, it can be difficult to bring it into production, where it can solve real-world business challenges. For example, according to Gartner, only 54% of the projects within organizations that are using or planning to use AI, make it from pilot to production. This is where Machine Learning Operations (MLOps) plays a crucial role.

Think of MLOps as DevOps for AI — a set of practices that aims to develop, deploy, and maintain machine learning models in production, reliably and efficiently. MLOps helps organizations deliver and apply high-quality models faster and with fewer errors.

However, MLOps is more than just a practice, it requires infrastructure to build AI applications upon: an MLOps platform. These MLOps platforms store data and artifacts and provide services and tools to train, run, and monitor models.

MLOps is getting a lot of attention, much like AI. New MLOps tools are released almost daily, and many blogs are written about them. People have high expectations that these tools are magic bullets to run models in production. However, MLOps is not just about the tools; it also relies on having the right technology and skilled people to handle the tasks properly. This means that simply using MLOps tools is not a guaranteed way to productionize AI. Building strong and effective MLOps platforms can be challenging and may not yield the desired results.

Only adding a flavor of MLOps to your project is no guarantee for success.

In this blog post, we’ll discuss six success factors for a successful MLOps platform implementation. Whether you’re just new to MLOps or you’re looking to improve your existing platform, this blog post will provide you with the insights and guidance you need to make your MLOps platform a success. Before we dive into the details, let’s ensure we are all on the same page.

What is MLOps?

As already stated, Machine Learning Operations (MLOps) is a set of practices and, in essence, an engineering culture that combines the development and operations aspects of machine learning/ML. MLOps can be summarised in seven core principles:

Seven core principles of MLOps, image by author.

You don’t need to fulfill all principles from the start, and there are several services and concepts that facilitate these principles, including:

  • An ML experimentation environment with access to raw and curated data
  • A feature store to accommodate a generic set of features, including lineage
  • Experiment tracking to store results of hyper-parameter optimization
  • Model registry to store model artifacts and model lineage
  • ML pipelines to operationalize all elements of model training and inference
  • Monitoring tools to monitor data and model quality
  • CI/CD pipelines for code, data, and models to deploy to several environments

Back to Basics: Shaping Your MLOps Platform

An MLOps platform is the foundation of your AI application, providing the infrastructure necessary for data processing and AI, such as data storage and compute. All services and concepts listed above then run on the platform. Sometimes, an application that consumes the results also runs on the platform, such as a web app. In addition to providing the tooling, the platform facilitates scaling and security for your AI application.

Nowadays, platforms are often built in the cloud. Infrastructure definitions and configurations are stored as code (infrastructure as code, IAC) and developed and deployed using CI/CD principles.

Starting point

Before we start a project to develop an MLOps platform, Let’s begin by defining the following:

  1. The business problem
  2. The analytical problem

The business problem originates within the organization and serves as the driving force behind the project. An example can be: “It’s hard to find the correct information in our document management system”.

The analytical problem translates this problem into something that we can solve using an AI product. For instance: “Can we provide the correct information from our document management system by smart interacting with the user and advanced document search using NLP?”

The analytical problem is the starting point for your project. The final AI solution should be able to address the business problem.

The six success factors

Now that we’re all on the same page, let’s discuss the six success factors:

  1. Begin by gathering requirements,
  2. Verify the need for an MLOps platform and manage expectations,
  3. Create a proof of concept,
  4. Create a design,
  5. Split the work into iterations,
  6. Keep the project team motivated.

While some of these success factors may seem obvious, I know from experience that skipping steps can lead to undesirable outcomes.

Success Factor 1: Requirements at the start

Begin by collecting requirements for the AI solution that you intend to build. These requirements are shaped by factors such as business and analytical problems, as well as by the organization, data availability, and existing infrastructure.

To define the requirements, you can ask questions about the solution. Examples include:

  • Current situation: what can you reuse from previous projects, and what infrastructure is already available?
  • End-users and Future Developers: who will maintain and use the solution in the final state, and what kind of frameworks are they familiar with?
  • Data: What is the volume of the incoming data? Is it structured or unstructured? How frequently does new data come in? Can developers and data scientists use production data or is a synthetic data set required?
  • Location: What can be hosted in the cloud, what can run locally, or do you need to use edge computing?
  • Pre-processing: Do you need to pre-process the data in batch mode or streaming? How do you clean the data? Are there requirements on data lineage, do you need a feature store? Hint: most likely not.
  • AI: What models do you need (e.g. rule-based, ML, deep learning, supervised or unsupervised)? How many models do you need to train and use in parallel? Can training and inference be directly connected, or will trained models be applied multiple times to new data? Are there any model lineage requirements?
  • Post-processing: Do you need a feedback loop to update data based on model results and user feedback? Do you need to monitor model performance and is there any action required based on the monitor results?
  • Consumption: How do you consume the data (e.g. API, web app)? How many use cases do you build on the platform, and do they end up in the same consumption layer?

One source that might be helpful is GitLab’s Jobs To Be Done, which lists the objectives to be achieved with MLOps. In addition to these objectives, critical success factors (CFFs) like delivering on time, staying within budget limits, and meeting quality standards should be considered. It’s also important to identify the technical risks associated with your project, which puts focus on certain aspects of your project and design. Technical risks can be addressed in a proof of concept (POC), as outlined in factor 3. Note that other types of risks, like expected changes in requirements, should be addressed by other risk management techniques.

At this point, you can create an initial design. Later in the project, the design can be refined — more on that in success factor 4. A design illustrates how data and models flow through the applications, and how MLOps enables this process with its services and tools. The design helps in understanding the problem and the solution, gathering requirements, having conversations with stakeholders, and defining the scope of a proof of concept.

2. Verify the need for an MLOps platform and manage expectations

Not all use cases require an MLOps platform. As running and maintaining a platform is expensive, and resources are scarce it’s advised to double-check its necessity. For example, in some projects, a single trained model integrated into a dashboard can effectively address the business problem — making MLOps redundant. The same analysis applies to the need for a platform. The model and dashboard of the prior example can be packed together and shared with the end user, running on local machines without requiring a platform.

However, our focus is on using AI in production. In my opinion, MLOps platforms are essential when AI is used in production apps. The following, while not exhaustive, outlines the prerequisites of such apps that MLOps can meet:

  • AI is used in a production environment, separate from the development environment,
  • Development and training are automated and executed in continuous iterations,
  • Data lineage and model lineage are necessary,
  • A clear view of data quality and model performance is required,
  • Certain security requirements must be met.

Once the decision to develop and implement an MLOps platform is made, it’s important to make stakeholders, like product owners and the executive team, aware of the components involved. Typically, stakeholders are primarily interested in the ML aspect, as it directly solves the business issue. The MLOps platform components however are often perceived as non-functional aspects. It’s essential to clarify the investments required in infrastructure and development for building an MLOps platform so that stakeholders understand it’s not a simple task. This way, expectations can be managed from the start, avoiding any surprises down the line.

When you run models in production, it is essential to continuously monitor, update, improve, and deploy them, making the MLOps platform essential — even though it can be costly. Manage stakeholders’ expectations of these investments from the start.

This is the end of the first part of a series of 2. In the second part, where we dive into the last four success factors:

3. Create a proof of concept,
4. Create a design,
5. Split the work into iterations,
6. Keep the project team motivated.

Stay tuned!

--

--

Niels van den Berg

MLOps specialist working for Deloitte. I assist clients with deploying ML applications to Production. Home automation enthusiast in my spare time.