Monitoring ML models with Vertex AI
Have you watched any of the SpaceX rocket launches? I watched them many times and each time I felt that I was participating in something amazing. Those rockets seem to be a true technological masterpiece.
This article, though is not about rockets. It is about machine learning models which you have already trained and launched or you are yet about to launch. I started with rockets because there aren’t many more technological developments that generate as much interest as the latest achievements in machine learning. You’ve probably heard about the following models: PaLM, GLaM, LaMDA, Gopher, Stable Diffusion, Dall-e2 etc. and these are just examples. But it does not have to be Google or OpenAI. I am quite sure that each time your model hits so-called production, you experience emotions similar to those you have when watching the rocket launch. But there are more similarities. In both cases, picking up the rocket/model to fly is just the very first step in its journey. And when it comes to this: Have you noticed, how SpaceX engineers are carefully monitoring that journey?
There are so many things that can go wrong that such a monitoring is absolutely necessary even though SpaceX pays great attention to testing of every single component both separately and in a set under conditions quite similar to those that the rocket will face during its flight.
This resembles me a lot ML training process where ML engineers train their models using training data assuming that this training dataset resembles data the model will see on its input after production launch. And I am not taking about just the very first minutes after production launch — flight conditions may change at any time and we need mechanisms that would help us to detect such changes.
I have absolutely no idea where to start building monitoring mechanisms for rockets — but as already mentioned — this article is not about rockets. Instead we want to show how to enable monitoring for launched ML Models and for this purpose we will use Vertex AI — Google’s Machine Learning Service available on Google Cloud.
Vertex AI Model Monitoring can help to detect the following types of changes in our model’s flight conditions:
1/ Training-serving skew — which occurs when the input data distributions after production launch are different from the input data distribution used to train the model. Vertex AI will ask us for the original training dataset we used so that model monitoring jobs can use it as reference.
2/ Prediction drift occurs when input data distribution after production launch changes significantly over time. Here Vertex AI does not need to know anything about training dataset. Instead it will collect statistics about input data from requests for predictions sent during previous monitoring windows.
Context
In order to show you how to enable model monitoring in Vertex AI we need ML Model in the first place. We will not dive deep into the training process but you will get all the details necessary to run this exercise on your side.
Our model will try to predict churn of telco customers. Training data comes from one of the Kaggle competitions: https://www.kaggle.com/datasets/yeanzc/telco-customer-churn-ibm-dataset
We will get to this point soon but when enabling model monitoring, Vertex AI will ask you to provide prediction input schema. If you use TensorFlow to train your models then you are lucky. SavedModel which is the format used by TensorFlow framework to package all that is needed to use the model includes so-called signatures. Signatures can be understood as specification of what model expects on input and what we can expect on its output. Because this specification is integral part of the model, Vertex AI is able to use it to understand prediction input schema. When our model is trained with other frameworks, e.g. Sklearn, Vertex AI will need our help and therefore will ask for prediction input schema. One question you may have though is what is the expected format and structure of prediction input schema. All the details can be found in official documentation (https://cloud.google.com/vertex-ai/docs/model-monitoring/overview#custom-input-schemas) but to help you better understand this concept — we will train our model using Sklearn and build the corresponding prediction input schema file.
Model training
To train our ML Model we will use Vertex AI Workbench. Create new Notebook from Python3 workbench image.
Log into Kaggle and download training dataset:
https://www.kaggle.com/datasets/yeanzc/telco-customer-churn-ibm-dataset
Instantiating dataframe from csv file will help us to better understand what kind of data is available to us.
Every row of this dataset corresponds to a different customer and every customer is described by a set of attributes like gender, whether it is a senior, payment method, what kind of services they subscribed (Streaming TV, Internet) etc. We will try to predict probability of churn given a mix of available attributes.
Some of the attributes are already numerical (SeniorCitizen, tenure, ..) some are boolean (Dependents, PhoneService) but most of them represent categorical data (e.g. paymentMethod: Credit card, Electronic check, Mailed check, …).
Before we continue we need to convert non-numerical attributes into numerical ones.
Although you will typically use technique called one-hot encoding to convert categorical attributes into numerical ones — here we will use label encoding. The way it works is by first building a dictionary of unique values of that attribute and then indexing every unique value with the corresponding numerical value between 0 and number of unique categories-1:
When all our attributes are numerical we can verify whether there are any correlations between them.
Positive values show that when one attribute increases — the value of the correlated one also increases and the corresponding numeric value expresses how strong is this relation. For example — the correlation factor between churn and contract is -0.4 meaning the longer the contract the smaller the probability of churn.
In fact, what is the most interesting for us is how distinct attributes correlate with our target attribute: CHURN.
We want to work with just a subset of attributes and selection will be based on correlation factor. Specifically, we want to use attributes for which correlation factor is larger than some threshold (in our case threshold is 0.05).
Last step we implemented is to normalize all numeric values so that they are between 0 and 1. Here we use MinMaxScalar Sklearn class.
We are ready to concatenate transformed attributes with target attribute (churn) to get training dataset:
Lets break this dataset into actual training set and testing set:
We will also define auxiliary function to print ML Model evaluation metrics:
We are ready now to run training. Lets use DecisionTreeClassifier:
Taking into account accuracy, precision and area under ROC curve we see that our model is at most average — but for this guide we do not have motivation to build anything better. What is important for us is the fact that we have ML Model which we can start monitoring.
Lets save our model to Google Cloud Storage (saved_model_path variable represents GCS location):
We are ready to import that model into Vertex AI Model registry:
There is one thing we want to mention here: we do not need to go through all these steps manually. Just the opposite — we highly recommend to utilize the full potential of Vertex AI capabilities as serverless MLOps platform and automate all the steps of model training and deployment.
Once the model is in Vertex AI Model Registry we can deploy it as REST endpoint:
Model monitoring configuration
When you register your model in Vertex AI Model Registry — you can deploy it as REST microservice to handles online predictions. All you need to do is to create Vertex AI Endpoint. When creating such an endpoint, Vertex AI will ask you if you want to enable model monitoring. Here are the steps:
- Enable Model Monitoring on your Vertex AI endpoint:
2. Specify Monitoring window length, Sampling rate and Email for alerts.
Monitoring window length describes how often monitoring job will be executed. In our example we want monitoring job to be executed every hour. Default value is 24 hours.
Alert email is email address where Vertex AI will send notifications whenever changes to input data distributions will
3. Prepare prediction input schema.
Save it as YAML file and upload to Google Cloud Storage bucket. Properties section lists all attributes expected on model input. You define attribute name and corresponding data type. What is quite important here is that order of these attributes does matter and it should be aligned with the order of attributes expected by our model.
Save this file and upload it into Google Cloud Storage. You will be asked to specify the location of this file:
4. Select monitoring objective.
In our case it is Prediction drift detection. By default all input features that are listed in your prediction input schema are monitored. And by default, alerts will be triggered when the distance expressed by distance metric computed for every feature will cross the threshold set at the level of 0.3. However, there is section called Options on Model Monitoring configuration view, called Alert thresholds which allows us to modify both: which features are to be monitored and what should be the threshold for every monitored feature. In our demo we will monitor only MonthlyCharges and and set threshold to 0.03.
Vertex AI expects our configuration to be a valid JSON string with key:value pair for every feature that needs to be monitored.
When you click Create button you should expect a new email in your mail box:
In order to prove it works we need to send prediction requests to our Vertex AI endpoint. Of course we will mock this data and here is how: we will use our training dataset and replace data in one column in that dataset: MonthlyCharges. By replacing we mean that we will inject there random values which follow normal distribution with the mean quite distant from the mean of values in reference dataset. The goal is to simulate drift here.
Instead of generating random data for predictions we will create prediction dataset as subset of our training dataset and apply some modifications to it. Individual records od prediction dataset will be sent in distinct HTTP requests to Vertex AI endpoint: ML microservice hosting our model:
We will then use np.random.normal function to generate random values following normal distribution with mean of 0.3 and standard deviation of 0.05. Generated values will replace values in MonthlyCharges column.
Here we have the new distribution (orange) compared with distribution in our training dataset (blue):
We are ready to send prediction requests to our model. It is deployed as REST API endpoint so all we need to do is to send HTTP requests to our vertex AI endpoint.
To authenticate we will need access token and here is how we can generate it (we execute this code from Vertex AI Workbench which is represented by service account. Roles assigned to this service account will determine which GCP services we can access and what we can do within those services)
Then we will inject that token into Authorization header of our HTTP requests. We will send a sequence of around 6000 requests in a loop every ten minutes. The loop is infinite but after 1 hour access token will time out and our Vertex AI endpoint will start sending authentication errors.
There is one more thing worth mentioning. When we enabled model monitoring we were asked for sampling rate — which is the percentage of prediction requests and corresponding responses we want to catch and persist.
Because we specified 100%, all our requests and responses are recorded in BigQuery table:
According to the defined schedule, Vertex AI will trigger monitoring jobs every hour. These are serverless jobs executed behind the scene.
First question you may have is where we should look for information about potential feature distribution drifts detected by our model monitoring jobs. And the answer is: Model monitoring view that is part of Vertex AI endpoint details view:
When we click on MonthlyCharges feature we get its distribution captured by the monitoring jobs executed so far (only last 50 job executions are displayed here):
You may be wondering why our monitoring job did not detect prediction drift here. The answer is rather straightforward — in prediction drift mode, which we chose, prediction drift monitoring jobs compare distributions computed on features sent as input to Vertex AI endpoint in two neighbouring monitoring windows. Distribution of features in training set does not matter here. It would if we choose to monitor training-serving skew.
Let’s stop our loop which sends prediction requests now and change again so the distribution of the MonthlyCharges feature. This time we will generate random values for MonthlyCharges which follow normal distribution with mean of 0.7 (vs 0.3 for previous set).
Here we have the new distribution (orange)compared with distribution in our training dataset (blue):
We are ready to send prediction requests again:
The new monitoring job should detect that distribution of MonthlyCharges shifted to 0.7 which is way more than 0.03 (threshold we specified when enabling monitoring) above 0.3 and as a result we can expect to be alerted.
We need to wait patiently now for the next monitoring job run. When it is finished, if we go to the list of Vertex AI endpoints, we will see that there is one new alert for endpoint hosting our model:
When we drill down and then click Enabled link in Monitoring column:
we will get to the view with a list of features and alerts:
In Monitoring alerts column we have a new alert notification for our MonthlyCharges feature!
We can drill down further to see the list of monitoring jobs executed so far. Distinct executions are represented by their execution timestamp on the right. You can click there to see how distribution calculated from data processed by distinct jobs changed. Finally, for the latest execution we see this message: Anomaly detected during this job run.
You can also expect email notification.
This is exactly what we wanted: native mechanism which will help us to detect changes in conditions in which our model works and notify us if those changes are larger than defined thresholds.
This article is authored by Lukasz Olejniczak — Customer Engineer at Google Cloud. The views expressed are those of the authors and don’t necessarily reflect those of Google.
Please clap for this article if you enjoyed reading it. For more about google cloud, data science, data engineering, and AI/ML follow me on LinkedIn.