Designing a Model Monitoring and Observability System

Published in

AI Science

16 min readDec 30, 2022

Model Monitoring/Observability Architecture, techniques, and solutions

Why Model Monitoring is important:

Machine learning is playing a vital role in modern technology in making a huge impact on everyone’s life. Unlike traditional software, ML models are not static but constantly learning data to produce outcomes. When there is a change in the data over a period of time, the predictive power of the machine learning model regresses to produce better outcomes and thus impacts the business KPIs. So monitoring the machine learning system with respect to detecting drift in the data is crucial in improving the model performance to adapt and produce meaningful outcomes.

Why Observability is much more important:

Monitoring the model for any drift in data is just not enough in understanding the root cause of the problem. Further monitoring, we need the ability to troubleshoot and provide insights in explaining the root cause of the model issues (drift, quality, bias, etc) by analyzing the trends and patterns so we can take proactive measures in optimizing and tuning the model. Model observability is the ability to monitor and understand the behavior and performance of a machine-learning model over time.

Monitoring => Something is wrong!

Observability => Something is wrong and here is why/how it happened!!

Observability == Monitoring + Explainablity(Insights into root causes, trends, patterns etc)

Do we need observability to be real-time?

Model observability could be offline or online/real-time. The offline model observability will typically include components for collecting, processing, and analyzing data, as well as mechanisms for alerting and adapting the model as needed. Real-time model observability involves continuously monitoring the model’s performance and behavior in real-time, as the model is running. Real-time model observability is particularly important in certain scenarios, such as:

Online learning systems: In online learning systems, the model needs to adapt to changes in the data distribution and application environment in real-time. Real-time model observability is critical for ensuring that the model is able to adapt and continue performing well.
Time-sensitive applications: In applications where time is a critical factor, such as fraud detection or financial trading, real-time model observability is crucial for making timely and accurate decisions.
Dynamic environments: In dynamic environments, such as those with frequent changes in the data distribution or application environment, real-time model observability is necessary to ensure that the model is able to adapt and continue performing well.

Though real-time model observability is very challenging to set up with proper automation in retraining the model, it is important in scenarios where the model needs to adapt to changing conditions or make timely and accurate decisions. It helps ensure that the model is able to perform well and continue providing value over time.

Architecture:

Monitoring and Observability in the entire ML Model Lifecycle

Model Monitoring and Observability Architecture:

Model Monitoring and Observability System Architecture

The above diagram depicts various components in the model observability system architecture.

Detailed Design:

Detection:

The detection phase involves drift detection, data quality/outlier detection, and capturing model performance and bias metrics as well.

Data Drifts:

There are three kinds of data drift that we can detect — concept drift, label drift, and feature drift.

Data Quality/Integrity:

Data quality and integrity can have a significant impact on model observability in a number of ways such as Performance degradation, Biased results, and Difficulty in interpretability. The data quality can happen due to unannounced upstream schema changes or missing or mismatched values. There are three types of violations that can occur at model inference with respect to feature data: missing feature values, type mismatches (e.g. sending a float input for a categorical feature type), or range mismatches (e.g. sending an unknown US State for a State categorical feature).

Data Outliers

Data outliers, or extreme values in the data, can impact model observability in a number of ways. Here are some potential impacts of data outliers on model observability:

Performance degradation: Data outliers can cause a machine learning model to perform poorly, as they may be significantly different from the majority of the data and may not be well represented by the model. This can lead to poor performance on the overall dataset and make it difficult to understand the model’s behavior.
Biased results: Data outliers may be disproportionately influential on the model’s predictions, leading to biased or skewed results. This can make it difficult to understand the factors driving the model’s predictions and identify any potential issues.
Difficulty in interpretability: Data outliers may make it difficult to interpret the model’s behavior and understand the factors driving the model’s predictions. This can make it challenging to identify and address any issues that may arise.

To address the impact of data outliers on model observability, it may be necessary to identify and remove or mitigate the impact of the outliers on the model’s performance. This can be done through techniques such as data preprocessing, outlier detection, or outlier removal. Overall, it is important to carefully consider the impact of data outliers on the model’s performance and take appropriate action to ensure the model’s performance and effectiveness over time.

Training-Serving Skew:

Training-serving skew, also known as training-inference skew or model drift, can be a significant issue in model observability. Training-serving skew refers to a situation where the performance of a machine learning model changes significantly between training and serving, or production, environments. This can occur for a variety of reasons, such as differences in the data distribution, hardware, or software between the training and serving environments.

To address training-serving skew in model observability, it is important to continuously monitor the model’s performance and behavior in the serving environment and identify any issues or trends that may indicate training-serving skew. This can be done using techniques such as tracking performance metrics, visualizing model outputs, or using explainability techniques.

If the training-serving skew is identified, there are several approaches that can be taken to address it, including:

Ensuring that the training and serving environments are as similar as possible: This can help minimize the impact of any differences between the environments on the model’s performance.
Regularly retraining and deploying the model: By regularly retraining and deploying the model, you can ensure that the model is adapting to changes in the data distribution or application environment.
Using techniques such as domain adaptation or transfer learning: These techniques can help the model adapt to changes in the data distribution or application environment and mitigate the impact of training-serving skew.
Stratified Sampling: Sampling data plays a vital role in model performance in training and production. Having proper stratified sampling techniques can reduce the deviation between the training and production data setup.

Overall, training-serving skew can be a significant challenge in model observability, but it can be addressed through careful monitoring and the use of appropriate techniques and tools.

Model Performance:

Model performance is an important aspect of model observability, as it reflects the model’s ability to accurately and reliably make predictions or perform tasks. To monitor model performance in model observability, it is important to continuously track the model’s performance on relevant metrics(Accuracy, F1 Score, AUC, Recall, etc) and identify any issues or trends that may indicate a need for model improvement.

In the context of model performance, business KPI can be used to measure the impact of the model on the business and track progress towards business objectives.

Some examples of business KPIs that may be relevant to model performance include:

Conversion rate: The conversion rate is the proportion of visitors to a website or app who take the desired action, such as making a purchase or signing up for a service. A model that predicts which visitors are more likely to convert can help improve the conversion rate.
Customer lifetime value: Customer lifetime value is a measure of the total value that a customer is expected to generate over their lifetime as a customer. A model that predicts customer lifetime value can help inform marketing and sales strategies.
Return on investment: Return on investment (ROI) is a measure of the profitability of an investment. A model that predicts the likelihood of a customer making a purchase can help optimize marketing and sales efforts and improve ROI.
Net promoter score: Net promoter score (NPS) is a measure of customer satisfaction and loyalty. A model that predicts customer satisfaction can help identify areas for improvement and increase NPS.

Overall, business KPIs can be useful metrics for measuring the impact of a machine learning model on the business and tracking progress toward business objectives. By aligning the model’s performance with relevant business KPIs, you can ensure that the model is providing value to the business and contributing to its success.

Model Bias and Fairness:

Model bias and fairness are important considerations in model observability, as they can impact the model’s performance and the decisions it makes. Model bias refers to the tendency of a model to make certain types of errors more frequently than others, while model fairness refers to the extent to which a model treats different groups of people equally.

To address model bias and fairness in model observability, it is important to continuously monitor the model’s performance and behavior, particularly with respect to any groups that may be disproportionately impacted by the model’s decisions. This can be done through techniques such as tracking performance metrics, visualizing model outputs, or using explainability techniques.

If model bias or fairness issues are identified, there are several approaches that can be taken to address them, including:

Balancing the dataset: Balancing the dataset by ensuring that it is representative of the target population can help reduce the impact of model bias.
Fairness metrics: There are various fairness metrics that can be used to quantify the level of bias or fairness in a machine learning model. These metrics can be used to identify areas of potential bias and to track the model’s performance over time.
Fairness constraints: Fairness constraints can be used to explicitly ensure that the model meets certain fairness criteria, such as equal opportunity or equal treatment. These constraints can be incorporated into the training process to guiding the model’s decision-making.
Using bias-aware algorithms: Some machine learning algorithms are specifically designed to mitigate the impact of bias in the data.
Using bias-correction techniques: Techniques such as pre-processing, data augmentation, or counterfactual generation can help correct for bias in the data.
Ensuring transparency and accountability: Ensuring that the model’s decisions and outputs are transparent and that there are clear processes in place for addressing any issues that may arise can help promote fairness and accountability.

Diagnosis:

The diagnosis phase involves below high-level steps and components.

Identify the cause of the data drift: Is it due to changes in the data distribution, or is it due to changes in the system or environment in which the model is being used? Understanding the root cause of the data drift will help you determine the appropriate action to take.
Re-train the model: If the data drift is due to changes in the data distribution, one option is to re-train the model on the updated data. This can help the model adapt to the changed distribution and improve its performance.
Monitor and continuously re-train the model: If the data drift is ongoing and likely to continue, you may want to set up a system to continuously monitor the model and re-train it as needed. This can help ensure that the model stays up-to-date and performs well over time.
Use drift-resistant models: Some models are more resistant to data drift than others. For example, models that are based on robust statistical techniques, such as robust regression or robust clustering, may be less sensitive to data drift. Consider using these types of models if data drift is a concern in your application.
Use data augmentation: Data augmentation involves generating additional training data by applying transformations to existing data. This can help make the model more robust to data drift, as it has seen a wider range of data and is less reliant on any one particular data distribution.
Use domain adaptation techniques: If the data drift is due to changes in the system or environment in which the model is being used, you may be able to use domain adaptation techniques to adjust the model to the new domain. These techniques involve adapting the model to the new domain by training it on a combination of data from the original domain and the new domain.

Model Lineage:

Model lineage is the process of tracking the history of a machine learning model and the data it was trained on. It can be an important aspect of model observability, as it can provide valuable insights into the model’s performance and behavior over time.

Here are some ways in which model lineage can help in model observability:

Understanding model performance: By tracking the history of the model and the data it was trained on, it is possible to understand how the model’s performance has changed over time. This can help identify any issues or trends that may indicate a need for model improvement.
Debugging and troubleshooting: By tracking the model lineage, it is possible to understand the factors that may have contributed to any issues or failures in the model’s performance. This can be useful for debugging and troubleshooting purposes.
Explainability and transparency: By tracking the model lineage, it is possible to provide a clear and transparent record of the model’s development and performance, which can be important for explainability and accountability purposes.

Overall, model lineage is an important aspect of model observability, and it can provide valuable insights into the model’s performance and behavior over time. By tracking the model lineage, it is possible to better understand the model’s performance and take appropriate action to ensure its effectiveness and value over time.

Model Explainability:

Model explainability refers to the ability to understand and explain the decisions and predictions made by a machine learning model. There are several methods that can be used to improve the explainability of a model:

Feature importance: One common method for explaining a model is to identify the most important features that the model uses to make predictions. This can be done using techniques such as feature importance scores or permutation importance.
Partial dependence plots: Partial dependence plots visualize the relationship between a feature and the model’s prediction while holding other features constant. This can help identify the most important features and how they impact the model’s prediction.
Local interpretable model-agnostic explanations (LIME): LIME is a technique that generates local explanations for individual predictions by perturbing the input data and observing how the prediction changes.
SHapley Additive exPlanations (SHAP): SHAP is a technique that explains the contribution of each feature to the model’s prediction using Shapley values from game theory.
Decision trees and rule lists: Decision trees and rule lists are machine learning models that are inherently explainable, as they generate a set of rules or decision points that can be used to understand the model’s decision-making process.

Metrics Store:

The metrics Store is a central repository for storing and tracking metric data. The metrics store may be implemented using a variety of tools and technologies, such as a time series database, a data lake, or a data warehouse. By storing metrics in a central location, it is possible to easily access and analyze the data to understand the model’s performance and identify any issues or trends that may indicate a need for model improvement.

Some common methods for storing metrics include:

Time series databases: Time series databases are specialized databases that are designed to store and analyze metric data over time. These databases are optimized for handling high volumes of data with a high rate of data ingestion, making them well-suited for storing metrics.
Data lakes: A data lake is a centralized repository that allows you to store structured and unstructured data at any scale. Data lakes can be used to store metrics, along with other types of data, and can support a wide range of data processing and analytics tools.
Data warehouses: A data warehouse is a centralized repository for storing and analyzing data from multiple sources. Data warehouses can be used to store metrics, along with other types of data, and can support a wide range of querying and analysis tools.
Cloud-based storage: Cloud-based storage solutions, such as Amazon S3 or Google Cloud Storage, can be used to store metrics and other types of data. These solutions offer a scalable and flexible way to store data and can support a wide range of data processing and analytics tools.

Overall, the choice of metrics storage method will depend on the specific needs and goals of the application, as well as the volume and rate of data ingestion, the types of data being stored, and the tools and technologies being used for analysis and visualization. By carefully considering these factors, it is possible to choose the most appropriate method for storing metrics in model observability.

Model Retraining:

Model retraining is an important part of model observability in machine learning, as it allows you to update and improve the model based on its performance and behavior.

There are several ways in which model retraining can be integrated into a model observability workflow:

Continuous monitoring: By continuously monitoring the model’s performance and behavior, you can identify any issues or trends that may warrant retraining the model. For example, you may want to retrain the model if its performance degrades significantly or if there are significant changes in the data distribution.
Regular retraining: You can schedule regular retraining of the model at predetermined intervals, such as monthly or quarterly, to ensure that the model is adapting to changing conditions and improving its performance over time.
Adaptive retraining: You can use techniques such as online learning or active learning to continuously update the model as new data becomes available. This can help the model adapt to changing conditions and improve its performance in real-time.

Overall, model retraining is an important part of model observability, as it allows you to update and improve the model based on its performance and behavior. By integrating model retraining into the model observability workflow, you can ensure that the model continues to perform well and provide value over time.

Online Model Retraining issues:

Batched retraining to avoid Outliers: An outlier could have an outsized influence on the model in online training, pulling the learned model further away from the target concept. This risk can be mitigated by conducting online updates with batches of observations, rather than with single data points.
Optimal learning rate: The learning rate could be too small, preventing the model from updating quickly enough in the presence of a large drift. The learning rate could also be too large, causing the model to overshoot the target concept and to continue to perform poorly.

Challenges in Realtime Model Observability:

There are several challenges that can arise in real-time machine learning model observability:

Data volume and complexity: In real-time systems, the volume and complexity of the data being collected can be overwhelming, making it difficult to monitor and analyze the data in real-time.
Data processing and storage: Real-time machine learning systems may generate large amounts of data that need to be processed and stored in real-time. This can be a challenge, particularly if the system lacks the necessary processing and storage resources.
Data quality and representation: Ensuring that the data being collected is of high quality and representative of the target population can be a challenge in real-time systems, as the data may be constantly changing and difficult to control.
Monitoring and analysis tools: Identifying and implementing appropriate tools and techniques for monitoring and analyzing the data in real-time can be a challenge, as the tools and techniques that are suitable for a given application may not be readily available or may require significant customization.
Scalability: Real-time machine learning systems may need to scale to handle large volumes of data and handle high levels of concurrent activity, which can be a challenge.

Overall, real-time machine learning model observability can be a complex and challenging task, requiring careful planning and the use of appropriate tools and techniques to ensure the performance and effectiveness of the model over time.

Conclusion

In conclusion, model monitoring and model observability are important for ensuring that machine learning models continue to perform well and provide value over time. By continuously monitoring and observing the model’s performance and behavior, it is possible to identify and address any issues that may arise and ensure the model’s effectiveness and value.

As discussed above, there are many different approaches and techniques that can be used for model monitoring and observability, including tracking performance metrics, visualizing model outputs, and using explainability techniques. It is important to carefully consider the specific needs and goals of the application when selecting the appropriate approach and tools for model monitoring and observability.

To be effective, model monitoring and observability should be integrated into the overall machine-learning lifecycle and should be ongoing rather than a one-time event. By continuously monitoring and observing the model, it is possible to identify and address any issues that may arise and ensure that the model continues to perform well and provide value over time.

References

Towards Observability for Machine Learning Pipelines

The Modern ML Monitoring Mess: Failure Modes in Extending Prometheus (3/4)

In the previous essay, I surveyed existing post-deployment issues and categorized them across two axes: state and…

www.shreya-shankar.com

The Modern ML Monitoring Mess: Research Challenges (4/4)

In the last few pieces, I've discussed streaming ML evaluation, thought about what to monitor (across state and…

www.shreya-shankar.com

A Machine Learning Model Monitoring Checklist: 7 Things to Track

How to monitor your models and which open-source tools to use

towardsdatascience.com

Detect data drift on datasets (preview) - Azure Machine Learning

APPLIES TO: Python SDK azureml v1 Learn how to monitor data drift and set alerts when drift is high. With Azure Machine…

learn.microsoft.com

Monitoring Machine Learning Models in Production

Guide on ML Model Monitoring in Production

pub.towardsai.net

Why is ML Model Monitoring Required After Deployment?

Introduction

heartbeat.comet.ml

Data Validation for Machine Learning - Google Research

Machine learning is a powerful tool for gleaning knowledge from massive amounts of data. While a great deal of machine…

research.google