MLOps — Productionising ML Models at Google Scale

Published in

Google Cloud - Community

8 min readDec 21, 2022

We’ve been using DevOps in conventional software development for a while now, but we can also use it for machine learning and artificial intelligence.

Why do we need DevOps? What’s the reason behind tiering Machine Learning and Artificial Intelligence on top of DevOps? What’s the difference between ML Ops and AI Ops? Continue reading this blog to find out.

What is MLOps?

Many industries integrate machine learning systems into their existing products and services because ML can be good for the bottom line, and it can sharpen your competitive edge.

MLOps is a set of practices for collaboration and communication between data scientists and operations professionals. Applying these practices increases the quality, simplifies the management process, and automates the deployment of Machine Learning and Deep Learning models in large-scale production environments. It’s easier to align models with business needs, as well as regulatory requirements.

The key phases of MLOps are:

Data gathering
Data analysis
Data transformation/preparation
Model training & development
Model validation
Model serving
Model monitoring
Model re-training

DevOps vs MLOps

DevOps and MLOps have fundamental similarities because MLOps were derived from DevOps principles. But they’re quite different in execution:

Unlike DevOps, MLOps is much more experimental in nature. Data Scientists and ML/DL engineers have to tweak various features — hyper parameters, parameters, and models — while also keeping track of and managing the data and the code base for reproducible results.
Hybrid team composition: the team needed to build and deploy models in production won’t be composed of software engineers only. In an ML project, the team usually includes data scientists or ML researchers, who focus on exploratory data analysis, model development, and experimentation. They might not be experienced software engineers who can build production-class services.
Testing: testing an ML system involves model validation, model training, and so on — in addition to the conventional code tests, such as unit testing and integration testing.
Automated Deployment: you can’t just deploy an offline-trained ML model as a prediction service. You’ll need a multi-step pipeline to automatically retrain and deploy a model. This pipeline adds complexity because you need to automate the steps that data scientists do manually before deployment to train and validate new models.
Production performance degradation of the system due to evolving data profiles or simply Training-Serving Skew: ML models in production can have reduced performance not only due to suboptimal coding but also due to constantly evolving data profiles. Models can decay in more ways than conventional software systems, and you need to plan for it.
Monitoring: models in production need to be monitored. Similarly, the summary statistics of data that built the model need to be monitored so that you can refresh the model when needed. These statistics can and will change over time, you need notifications or a roll-back process when values deviate from your expectations.

MLOps and DevOps are similar when it comes to continuous integration of source control, unit testing, integration testing, and continuous delivery of the software module or the package.

However, in ML there are a few notable differences:

Continuous Integration (CI) is no longer only about testing and validating code and components, but also testing and validating data, data schemas, and models.
Continuous Deployment (CD) is no longer about a single software package or service, but a system (an ML training pipeline) that should automatically deploy another service (model prediction service) or roll back changes from a model.
Continuous Testing (CT) is a new property, unique to ML systems, that’s concerned with automatically retraining and serving the models.

How to implement MLOps

There are 3 ways you can go about implementing MLOps. This section describe three levels of MLOps, starting from the most common level, which involves no automation, up to automating both ML and CI/CD pipelines.

MLOps level 0: Manual process

At Level 0, process for building and deploying ML models is entirely manual. This is typical for companies that are just starting out with ML.

Characteristics

Manual, script-driven, and interactive process: every step is manual, including data analysis, data preparation, model training, and validation. It requires manual execution of each step and manual transition from one step to another.
Disconnect between ML and operations: the process separates data scientists who create the model, and engineers who serve the model as a prediction service. The data scientists hand over a trained model as an artifact for the engineering team to deploy on their API infrastructure.
Infrequent release iterations: the assumption is that your data science team manages a few models that don’t change frequently — either changing model implementation or retraining the model with new data. A new model version is deployed only a couple of times per year.
No Continuous Integration (CI): because few implementation changes are assumed, you ignore CI. Usually, testing the code is part of the notebooks or script execution.
No Continuous Deployment (CD): because there aren’t frequent model version deployments, CD isn’t considered.
Deployment refers to the prediction service (i.e. a microservice with REST API)
Lack of active performance monitoring: the process doesn’t track or log model predictions and actions.

The engineering team might have their own complex setup for API configuration, testing, and deployment, including security, regression, and load + canary testing.

Challenges

In practice, models often break when they’re deployed in the real world. Models fail to adapt to changes in the dynamics of the environment or changes in the data that describes the environment.

To address the challenges of this manual process, it’s good to use MLOps practices for CI/CD and CT. By deploying an ML training pipeline, you can enable CT, and you can set up a CI/CD system to rapidly test, build, and deploy new implementations of the ML pipeline

MLOps level 1: ML pipeline automation

The goal of MLOps level 1 is to perform continuous training (CT) of the model by automating the ML pipeline. To automate the process of using new data to retrain models in production, you need to introduce automated data and model validation steps to the pipeline, as well as pipeline triggers and metadata management.

Characteristics

Rapid experiment: ML experiment steps are orchestrated and done automatically.
CT of the model in production: the model is automatically trained in production, using fresh data based on live pipeline triggers.
Experimental-operational symmetry: the pipeline implementation that’s used in the development or experiment environment is used in the preproduction and production environment, which is a key aspect of MLOps practice for unifying DevOps.
Modularized code for components and pipelines: to construct ML pipelines, components need to be reusable, composable, and potentially shareable across ML pipelines (i.e. using containers).
Continuous delivery of models: the model deployment step, which serves the trained and validated model as a prediction service for online predictions, is automated.
Pipeline deployment: in level 0, you deploy a trained model as a prediction service to production. For level 1, you deploy a whole training pipeline, which automatically and recurrently runs to serve the trained model as the prediction service.

Challenges

Assuming that new implementations of the pipeline aren’t frequently deployed and you are managing only a few pipelines, you usually manually test the pipeline and its components. In addition, you manually deploy new pipeline implementations. You also submit the tested source code for the pipeline to the IT team to deploy to the target environment. This setup is suitable when you deploy new models based on new data, rather than based on new ML ideas.

However, you need to try new ML ideas and rapidly deploy new implementations of the ML components. If you manage many ML pipelines in production, you need a CI/CD setup to automate the build, test, and deployment of ML pipelines.

MLOps level 2: CI/CD pipeline automation

For a rapid and reliable update of the pipelines in production, you need a robust automated CI/CD system. This automated CI/CD system lets your data scientists rapidly explore new ideas around feature engineering, model architecture, and hyperparameters. They can implement these ideas and automatically build, test, and deploy the new pipeline components to the target environment.

This MLOps setup includes the following components:

Source control
Test and build services
Deployment services
Model registry
Feature store
ML metadata store
ML pipeline orchestrator

Characteristics

The following diagram shows the stages of the ML CI/CD automation pipeline:

Development and experimentation: you iteratively try out new ML algorithms and new modeling where the experiment steps are orchestrated. The output of this stage is the source code of the ML pipeline steps, which are then pushed to a source repository.
Pipeline continuous integration: you build source code and run various tests. The outputs of this stage are pipeline components (packages, executables, and artefacts) to be deployed in a later stage.
Pipeline continuous delivery: you deploy the artefacts produced by the CI stage to the target environment. The output of this stage is a deployed pipeline with the new implementation of the model.
Automated triggering: the pipeline is automatically executed in production based on a schedule or in response to a trigger. The output of this stage is a newly trained model that is pushed to the model registry.
Model continuous delivery: you serve the trained model as a prediction service for the predictions. The output of this stage is a deployed model prediction service.
Monitoring: you collect statistics on model performance based on live data. The output of this stage is a trigger to execute the pipeline or to execute a new experiment cycle.

Conclusion

To summarize, implementing ML in a production environment doesn’t only mean deploying your model as an API for prediction. Rather, it means deploying an ML pipeline that can automate the retraining and deployment of new models. Setting up a CI/CD system enables you to automatically test and deploy new pipeline implementations. This system lets you cope with rapid changes in your data and business environment.