Mastering Drift: Strategies for Monitoring ML Models in Production

6 min readJun 3, 2023

In this article, we will discuss the importance of monitoring machine learning models in production, as well as describe the main types of drift, how to identify and treat them!

Introduction

The use of machine learning models in production has become increasingly common in companies looking to make more accurate decisions and increase business efficiency. However, monitoring of these models is often neglected or not considered a priority.

As the number of productive models grows, it becomes more complex to track what is happening internally with the decisions of each of these models. Without proper monitoring, models can start to produce inaccurate or inconsistent results, which can lead to financial losses or even damage to your business’s reputation.

The learning landscape of machine learning models is not set in stone. They learn from data, and over time reality changes, the data as well.

But after all, what is Drift?

Drift is a phenomenon that occurs when data changes, rather, it occurs when the distribution of data changes over time. These changes can be caused by many factors, such as changes in the production environment, changes in user behavior, changes in data sources, or simply aging data.

Drift can happen in several ways, as shown in the figure below, it can happen suddenly, incrementally, gradually or even recurrently/seasonally.

Therefore, we need to know what types of drift we can find and what to do.

Drift Types and Identification Techniques

Let’s talk about what types of drift exist and how to treat them?

Feature Drift

Feature drift is a type of drift that occurs when the distribution of input data changes over time, affecting the performance of the machine learning model in production. This can happen for a variety of reasons, such as changes in user behavior, changes in collected data, or seasonal variations.

For example, raising the minimum wage can lead to greater purchasing power, which can generate an increase in retail sales volume. This would certainly represent a variation in features of models in this market.

Fonte: Adaptado de Ashok, S., Ezhumalai, S., & Patwa, T. (2023) [1]

To identify the Feature Drift, it is possible to use statistical tests such as the Kolmogorov-Smirnov (KS) and the Population Stability Index (PSI) to compare the input data distributions in different time periods or in different data segments [1][2] .

Two of the possible actions to be taken when the existence of Feature Drift is confirmed is the investigation of the feature generation process and another is the retraining of the model using the new data layer in case these are also generating another drift, the prediction drift.

Prediction Drift

Prediction drift is a type of drift that occurs when a machine learning model starts producing inaccurate or inconsistent predictions in a production environment.

For example, a model that pointed to a common behavior before and now points to a completely different behavior can represent a prediction drift.

To monitor prediction drift, we can also use statistical techniques such as the Jensen Shannon Divergence (JS), Kolmogorov-Smirnov (KS) and the Population Stability Index (PSI) to compare the distributions of model prediction scores over different time periods or in different data segments [1][2][4][5].

An even more reliable way to validate the Prediction Drift is to analyze, over time, the history of what was predicted against what was actually performed considering some model performance metric, such as Normalized Absolute Mean Error or Recall.

Once the Prediction Drift is identified, you must evaluate the model’s prediction process to look for inconsistencies, a possible inconsistency may have been generated by a Feature Drift.

In addition, you should also evaluate the result that the model is generating in the business by monitoring the business metrics associated with the model, and you can also retrain the model.

Concept Drift

This last type of drift that we will discuss is a little more complete, but intuitive, Concept drift is a type of drift that occurs when the relationship between the input and output variables of the model changes over time. This can happen due to changes in input data, changes in the environment in which the model is used, or changes in user preferences.

For example, if we needed to know the most reliable way to obtain information, a person born in 1920 would indicate a battery operated radio, and a person born in 2015 would indicate the Tweeter. We keep getting information, the way to get news has changed, that’s Concept Drift

A common technique for detecting concept drift is to monitor model performance metrics over time. If these metrics deteriorate over time, it could indicate the presence of concept drift. Furthermore, a technique recommended by Gama et al (2013) is to compare model performance with a reference model that was trained with the same input data, but in a previous period. This can help identify changes in data distribution or in the relationship between variables over time [3][4].

Once the Concept Drift is confirmed, the first recommended actions are to change the way of building the model, as well as the variables that are involved in the training, which can be added, modified or even removed. After all, we need to capture this shift in reality, don’t we?

Python package suggestions for Drift Monitoring

A tip to avoid having to develop all monitoring manually, you can consider some tools like:

NannyML: https://github.com/NannyML/nannyml
Evidently: https://github.com/evidentlyai/evidently
Eurybia: https://github.com/MAIF/eurybia
YData: https://github.com/ydataai/ydata-quality

Final Thoughts

Monitoring machine learning models in production is essential to ensure the models are producing accurate and consistent results over time.

Drift monitoring can be a valuable tool to help with this process, detecting when models need to be updated or adjusted.

Fortunately, drift monitoring techniques can help mitigate these risks. By monitoring models in production and identifying when they need to be adjusted or updated, drift monitoring can ensure models remain accurate and effective over time.

It is important to note that drift monitoring is not a one-size-fits-all solution. It’s just one part of the broader process of managing machine learning models. To ensure models remain accurate and effective, they need to be regularly evaluated and updated, and drift monitoring is one of the tools that can be used to help with this process.

With proper monitoring, companies can maximize the value of their machine learning models and avoid financial or reputational losses.

After knowing all this, the answer to the question “When do we need to monitor models in production?” is: whenever possible.

Let’s connect

Did you like the content? Let’s have a coffee, add me on LinkedIn to exchange ideas and share knowledge!

https://www.linkedin.com/in/iagobrandao

References

[1] Ashok, S., Ezhumalai, S., & Patwa, T. (2023). Remediating data drifts and re-establishing ML models. Procedia Computer Science, 218, 799–809.

[2] dos Reis, D. M., Flach, P., Matwin, S., & Batista, G. (2016, August). Fast unsupervised online drift detection using incremental kolmogorov-smirnov test. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1545–1554).

[3] Gama, J., et al. “A Survey on Concept Drift Adaptation.” Journal of Systems and Software 86, no. 8 (2013): 2263–88

[4] Lakshmanan, V., Robinson, S., & Munn, M. (2020). Machine learning design patterns. O’Reilly Media.

[5] Shanbhag, A., Ghosh, A., & Rubin, J. (2021). Unified shapley framework to explain prediction drift. arXiv preprint arXiv:2102.07862.

[6] https://www.evidentlyai.com/blog/machine-learning-monitoring-data-and-concept-drift