Control Charts for Machine Learning Using Python

6 min readAug 28, 2022

Introduction

Control charts are a visual mechanism used to monitor a process by tracking independent observations of a quality characteristic across time. This methodology was introduced by the statistician Walter Shewhart in the early 20th century and has found many applications in industry settings (most notably in the manufacturing sector). The main idea of control charts is to determine if a process is under statistical control by setting lower and upper bounds (i.e., control limits) based on the probability distribution of your quality characteristic. Then, you can monitor your process across time using those bounds. Although somewhat antique, I believe control charts are a valuable methodology for monitoring deployed machine learning models. In this article, I will present a case on how we can use the most basic control chart to monitor a deployed machine learning model.

Are Control Charts Useful Today?

Some think of control charts as an antique technique only suitable for manufacturing applications. However, I believe that recent computational advancements have made control charts more practical than ever. In the earlier days, most data gathering had to be manual (i.e., physical data gathering). Today, we have sensors that can automatically gather data about any process. We also have virtual processes that don’t necessarily have a physical component and whose data is entirely on the cloud. We can think of a deployed machine learning model as a virtual process. Today, machine learning models are being deployed in all kinds of industries. Therefore, data scientists and companies must monitor their deployed machine learning models. One way to do that is by using control charts.

The Case Study: Using Control Charts to Monitor ML Model Performance

Let’s say you work as a data scientist in a hospital trying to implement a system to enable patients and insurance companies to evaluate the cost of discharging patients. Therefore, your task as a data scientist is to develop a machine learning model to predict the cost of discharging a patient before the patient is even admitted to a hospital. As a data scientist, you followed the machine learning pipeline for model deployment. You gathered and cleaned the data, modeled the problem, and tested those models in unseen data. After validation, the model is ready to be deployed. You know that deploying the model is not the last step of the machine learning pipeline. You need to establish a way to monitor its performance.

There are many ways you could do that. However, monitoring your deployed model is always a complicated task. Nepute AI, a start-up focused on experiment tracking and model registry, wrote a helpful article about the topic [1]. One of the strategies for monitoring your model is using statistical techniques to identify potential distribution shifts in the features used to train the models. Although this strategy is helpful, it may not be enough. What if model performance is affected by features not included in the model? In that case, only monitoring the feature distribution could lead to erroneous conclusions about model performance. To that end, you may be interested in monitoring prediction errors across time.

Monitoring Model Performance

About Control Charts: What Is our Goal?

Let’s say that during the training process, you estimated that the mean squared error was 228 squared dollars. After deployment, you want to ensure that the mean squared error remains relatively constant across different points in time (or at least does not increase). Therefore you decide to use an XBar control chart to monitor model performance. This will enable you to establish an acceptance region where the mean squared error will likely lie, given that the mean is hypothesized to be 228 squared dollars. This region can be defined as follows:

Equation 1: Control Bounds for the XBar Control Chart

Where xbar is the average of the mean squared errors for each time point, sbar the average of the standard deviations for each time point, c4 a bias control constant, n the sample size for each time point (i.e., how many observations are available for each time point), and k determines the width of your interval. Using these bounds, we can monitor the mean squared error across different points in time. If the mean squared error is outside those bounds, then we say that that point is out of control. It may indicate a shift in the distribution of the mean squared error. Your goal is develop such chart.

Implementing an Xbar Control Scheme

As a data scientist working for the hospital, you started to work to develop this control chart. Once the model is in production, you gather data about its performance. You created a dataset with 20 random sample transactions daily for 25 days. You used the first 25 days to train and learn the parameters for the control charts and started monitoring after day 25. In table 1, you can see the structure of the data gathered.

Data Processing

After gathering and cleaning the data, you started modeling. First, let’s start by importing important and helpful python libraries. We will use pandas to read the data, NumPy for numerical computations, and Matplotlib for data visualization. ControlCharts is my Python module for generating control charts.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from ControlCharts import ShewhartControlModel

After importing important libraries, we will proceed to read and process the data. We will group by day and then calculate the average and standard deviation for the transactions sampled for each day. A total of 20 transactions per day were sampled. Therefore, we say we have a sample size of 20.

# Read data from csv
data= pd.read_csv('Examples/transaction_data.csv')# Group data by day and take the average of the Squared Error
grouped_data= data[['Day', 'Squared Error']].groupby(by='Day').mean()
grouped_data_std= data[['Day', 'Squared Error']].groupby(by='Day').std()# Train-Test Split
x_train= grouped_data.iloc[:25, :].values.reshape((25,))
x_test= grouped_data.iloc[25:, :].values.reshape((25,))

Training the Control Chart Model

Once the data is imported and ready to go, we will proceed to estimate the standard deviation and train the control chart model. For that, we define the sample size as n=20 (since there are 20 samples per day) and then look up the bias control constant associated with that sample size.

# Estimate standard deviation
qcc= pd.read_csv('BiasControlConstants/quality_control_constants.csv')
n= 20
c4= qcc['c4'].loc[n-2]
sigma_train=np.average(grouped_data_std.values[:25])/(np.sqrt(n)*c4)

Once we know the standard deviation, we can proceed to train the control model and obtain the control bounds. For this, we will use QualityTools.

# Build Shewhart Control Model
spc_model= ShewhartControlModel(k=3)
ucl, lcl= spc_model.fit(x_train, sigma= sigma_train)
ooc= spc_model.predict(x_test)
testing_plot=spc_model.plot(x=x_test, dpi= 100)# Plot all data
complete_plot= spc_model.plot(x=grouped_data.values, dpi= 100)
train_plot= spc_model.plot(x=x_train, dpi= 100)

Results

The first step is to check whether the model performance is under control. To do that, we observe figure 1 and see that the model is in control during the training period. Now, we can start monitoring our model with the calculated control bounds.

Figure 1: Training XBar Control Chart Model

In figure 2, you started tracking the model across time after day 25. Then you realized that it seems that around day 40, there was some shift in model performance. The next step is to determine what caused that shift and work rapidly to ensure that it doesn’t affect your clients.

Figure 2: Training and Testing XBar Control Chart Model

Conclusions

The purpose of this article was to present an idea. I know that there are many challenges concerning the implementation of control charts for model performance in machine learning. However, I think this methodology can become a valuable tool if adequately studied. For instance, using control charts to monitor machine learning model performance can enable engineers to develop robust systems capable of triggering automatic alerts when there is a shift in model performance. Data scientists can then work quickly to correct the issue and minimize potential impacts.

It is important to note that for this methodology to work, analysts should understand how to design and interpret control charts. I did not elaborate on theoretical concepts about control charts that are necessary to design and interpret the control charts correctly. If anyone is planning to use control charts for monitoring models in production, I recommend studying the theoretical foundations of control charts before any implementation.

Side Note:

I created a modular implementation using Python of some basic control charts. There are many implementations available, however, you can find mine along with the code and simulated data presented in this article here: https://github.com/fernando-acosta/QualityTools

References

Oladele, S. (2022, August 11). A Comprehensive Guide on How to Monitor Your Models in Production. Neptune.Ai. https://neptune.ai/blog/how-to-monitor-your-models-in-production-guide

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

medium.com