MLOps: Technical overview and Components

Overview

Sammer Puran
5 min readDec 21, 2023

Main Components of MLOps

As mentioned in the last post, we discovered why we need to manage model lifecycles independent of software lifecycles and what the main differences are between traditional software engineering. In this post we will see what components are necessary for MLOps and why we need them.

The main components of MLOps can be seen here:

MLOps stack: source https://ml-ops.org/content/mlops-stack-canvas

Asumming we have a specific use case and some data defined

Data Ingestion, Transformation and Versioning

If you have developed machine learning models in university or locally, you probably have taken a csv with all the features developed some model on some train data and tested it on some other data. Unfortunately it is not so easy in real life. You need to get the data from somewhere, ingest it into your Machine Learning or analytics platform, then compute some features on top of it, which then can be used for training models.

Normally these data come from is coming from different sources which is and needs to be extracted, transformed and loaded (ETL) simplified in this graphic:

ETL process and geature engineering

To be able to train a Machine Learning model, we need to define features that the model can use which happens in the feature engineering step. Many people do that in the code where they train their model in. However there is a problem with that: in this way the training and feature engineering are not decoupled: if you have to change something in the training or feature engineering it will affect the whole code. Also reusability is not ideal: if you have multiple models, that depend on the same feature engineering you would need to replicate this step across model. For this reason, it would be better to decouple the feature engineering step from the Machine Learning model.

Okay lets say you are developing some models and and testing and evaluating them. If you were using some csv file as your data, then you can make sure that the comparison is valid as you are testing on the same dataset. But is that also the case in real life? In real life data can change a lot (sometimes hourly) because the data generating process is constantly generating data, column names are renamed etc… If you are not tracking your data and models, it can create a mess of models and data they were trained on. Also comparing models can be tricky, since you are not sure, if models are trained on the same data and you will have no way to capture the data at the same timepoint. Data versioning aims to solve this issue by tracking and versioning changes in the data — similar to how you track changes in software in a versioning tool. Also in that way you can actually check if a data change caused your model to perform worse, by simply comparing the model performance on data at different timepoints.

Model development: Experiment Management and Tracking

So with the data ingested and the features created it’s time to do what data scientists are normally paid for, creating models. So you have some data, you will try different models with different hyperparameters, for some model you might use one library for another model another library. Because you are alone, you know which model and which hyperparameters gave you what performance.

But let’s say you are a team of data scientists working on the same problem, some people are experimenting with deep learning, some with more classical methods. Since you are a team of people it will be harder to track what experiment, what model and what library lead to what performance.

With an experiment tracking you have a centralised place to see all your experiments, see all the configuration for the experiments, to compare experiments and chose the best version. The best version can then be used to run inference for incoming data. This can look like this. I will cover this topic in a later post in more detail.

Experiment Tracking on MLFlow

Model deployment: Model registry and model versioning

If you are single data scientist you worked on a model on some specific dataset and produce various ML models (artifacts), that you track with specific naming conventions. You deploy it yourself or hand it over with information about the model such as: data that it was trained on, runtime dependencies etc…

This can work if your one of the few data scientists at work, at don’t have many projects. If you are a team of data scientists who are shipping out more and more ml based solutions this can get tricky. You need to know, where is the best model version, the training details, how you can review the model before putting it into production. There a model registry can help you store all the necessary information in a central repository, so that you also can roll back to a older model version should problems arise.

Monitoring

So you evaluated your model on some test data in your development of the model. Now you put the model in production where it makes live predictions on some data. It is working fine and the customers are happy.

However, after some time, the customer is sending you queries that the model is not producing not high quality predictions anymore. The reasons can be tenfold: data changed, label distributions changed, you have new features. Without any monitoring, you cannot catch when the model detoriated or what caused it. You can retrain the model periodically to circumvent this issue, but the problem remains: you are not seeing problems the moment they arise, but later; this can cause substantial loses.

To really solve this issue you need some kind of monitoring. You can monitor if there was a substantial shift in data (called drift), if the performance detoriated and other things.

We touched many topics in this overview, and we will go over each of them in a detailed series of article. Since the first step to have a ML model is to have data, we will start with it here:

Data, Features and Data Versioning: https://medium.com/@samipuran/mlops-data-features-and-versioning-90cd925b678d

--

--

Sammer Puran

I am a MLOps specialist/Data scientist working for the swiss national television. My experience in the data science realm is 5 years.