How to do MLOps right?

Marouen Hizaoui
Machine Learning Reply DACH
6 min readMay 18, 2022

MLOps is a set of concepts and best practices that aims to build reliable and efficient machine learning systems. Our Previous article demonstrated what MLOps is and how crucial it is for any organization that wants to leverage ML (Machine Learning) products.

In this second article in our series, we will go through the different MLOps principles and how it is done efficiently. As we detailed in our previous article, a ML product is the combination of the three pillars: Data, Model, and Code. And although they are not at all new, the 6 principles of MLOps are applied differently to each pillar. Hence, in this article we’re going to dig into each principle and how it is applied to data, model, and code.

Figure 1: The 6 principles of MLOps

Versioning

Versioning is a paramount practice for building reliable ML products, it aims at having each component of the ML product versioned with the possibility to move up and down between versions. This not only enables you to keep track of changes but also ensures availability by having the ability to roll back to a working version in the inevitable case of showstoppers.

For code, version control is the way to go and since versioning code is mature at this point, there are plenty of technologies dedicated for that purpose, the most famous of which is Git. Code versioning is in fact applied to all the mechanisms that we will cover in the next sections such as CI/CD pipelines and application builds. Model versioning on the other hand is best handled using a model registry, which is a repository of trained models. A model registry also has the capability to store metadata, training process, and parameters used to create the model. Data versioning can be done using feature stores, data processing pipelines, and data version control. The end goal is to be able to clearly define data lineage and data provenance with the possibility to switch back and forth between versions when the need arises.

Testing

The high complexity of ML products makes testing a practice to live by. Unit tests, integration tests, and end-to-end tests are the standard for code testing. For Model testing, in addition to the standard model accuracy testing and validation, usually done during training, functional input-output testing should also be established. By doing so, you are going to ensure that your model will behave as you would expect it to and spare you costly showstoppers. As for Data, it is important to test that the data structure and field types are as expected with best practices like placing tests at the beginning and end of each process in the product pipeline. Such tests will make sure that the entire product is not going to go down because of a faulty data record. In addition to that, feature creation unit tests are also a good practice.

Automation

Over the last decades, automation has become one of the most important principles to follow. That is because a ML product should always be reliable and sustainable. Therefore, updates and changes are much more common. Examples of changes can be new functionalities in the product, a better model, or the use of new data. Hence, automation allows you to save a lot of time which can, in turn, be used for more productive tasks. Furthermore, they reduce the possibility of human error. Therefore, pipelines are very handy for automation. CI/CD pipelines, model training /validation pipelines, and data processing pipelines are effective ways to ensure automation for code, model, and data. This is a huge topic on its own and may well be another entry in future article series.

Reproducibility

Since ML is, by definition, an experimental discipline, reproducibility is key. There are various interconnected measures that can ensure code reproducibility. Virtual environments, container images, and infrastructure as a code as well as application builds are some of the measures that help to ensure code reproducibility. As for model reproducibility, experiment tracking is the way to go. In addition to that, a model registry that serves the same model to all the different potential environments ensures complete model reproducibility. Data reproducibility is strongly coupled to versioning; hence feature-stores and data versioning are the best practices to follow. As more experiments are done, keeping track of what data was used where can quickly become an overwhelming task. These practices will ensure that the data used for each experiment is easily reproducible.

Deployment

As mentioned before, the evolving nature of ML products makes adding changes necessary and quite common. In software terms, changes are so-called deployments. For code, deployments through CI/CD pipelines are the star of the show, it is important to set up deployments as part of those pipelines. As for Model deployment, in an article published in 2020 by Algorithmia, unreasonably long model deployment timelines were unveiled, in which only 14% of the surveyed companies can manage the deployment of a new model to production in under 7 days, 35% of the surveyed companies would need at least 3 months. There are still companies that require years. Adopting model deployment principles early on can eliminate this showstopper entirely.

Figure 2: The model deployment timeline

Model registries are very handy for model deployment. Simply said, model registries are a repository for models. However, model registries can also store metadata about those models such as hyperparameters and information on how the model was created. Most model registries also support model versioning. Using model registries reduces the hassle of model deployment to a simple model push to the registry.

Data deployment, on the other hand, can be efficiently managed using feature stores across environments. As further feature engineering takes place keeping track of where each feature came from and why certain parameters were set up is paramount. Therefore, a feature store, which is simply a database for the different features used, tracks its lineage and provenance. A feature store even provides the possibility to generate features. Hence a feature store would certainly give an unmatched advantage when the need for deploying new features arises.

Monitoring

ML products’ complexity makes monitoring a must, not only performance monitoring but also monitoring of all the aspects of the product end to end. It’s not possible to optimize what we cannot control, and we cannot control what we cannot measure, so generating metrics is crucial. Monitoring of an ML product can be viewed in 3 different areas listed below.

  • Data health monitoring: metrics about the health of the live data being fed to the product, metrics could be about data distribution and data structure. This will help us detect showstoppers, such as corrupted data and data drift.
  • Model health: metrics about model accuracy on the live data, globally and across different classes would be informative. This is use-case dependent. To get model accuracy, Ground truth data is needed, which might not always be possible. When ground truth data cannot be made available, a few estimation approaches can be used instead. An example of such an estimation approach is by using Proxy metrics, which are metrics that are correlated to the output we want to measure. This would serve as an indication of how accurate the predictions are. Model monitoring is crucial in detecting showstoppers like concept drift, model imbalance, unfairness, and many more.
  • System health: an ML product consists of different resources (compute, storage, streaming, etc.). Monitoring system health means monitoring the performance of each of those resources in terms of runtime, CPU consumption, memory consumption, number of invocations, requests, etc. This helps to avoid resource unavailability for proper function and extra silent costs by faulty excess of resource usage. Additionally, it is important to monitor the interactions between the resources to identify potential bottlenecks in the product’s architecture.

That is a breakdown of each MLOps principle, principles that ML Reply is successfully living by. Their scope reflects the complexity of building successful ML products. It is vital to keep in mind that, the key is to cover each principle and to start simple.

We, at Machine learning Reply, are actively advising and guiding our clients and partners in establishing the different MLOps principles in the most efficient ways. Please do not hesitate to contact us for anything regarding MLOps.

In the next article in our series, we will go through our implementation approach so that you as well can do MLOps efficiently and lead your organization’s next-generation ML products.

--

--