The Model Retraining Bible, part 1

7 min readMar 22, 2023

This will be a three-part series discussing every dimension of model retraining so that you don’t have to worry about post-deployment model performance decay. This three-part series will be followed by a scenario where we will discuss how model retraining can be implemented in different ways.

Ok now, so you have created a production-ready machine learning model. You deploy that model to make predictions and it is working just fine in the real world.

What a happy moment that our job is over and now comes the time to move on to the next project. But is our work really over?

Let us tackle this issue one bite at a time:

⦁ First, let’s try to understand which problems we’re going to face during the productive usage of our model

⦁ then let’s try to understand what the terms like model drift and model retraining really mean.

⦁ and finally, let’s look at some approaches by which this problem can be solved

The Problem of Model Drift

Traditionally, a machine learning model is optimized to help us map a set of input features to the output targets. Then, this model is deployed in production to make predictions on unseen data.

There are three assumptions that we need to make along the way:

⦁ The future data has to be similar to the training data (Past data).

⦁ Distributions of the features and targets have to remain fairly constant.

⦁ There are no features that influence the target variable and are not captured by the model OR if there are any, they do not change at all.

With this knowledge in mind, consider the fact that the environment where our model is deployed is ever-changing. The predictive performance of a model is sure to degrade over time. Why?

Because trends change over time. Model deployment is a continuous process, and we need to ensure that our models are adaptive and tolerant to the changes in data distributions.

These drifts contribute to model decay. Now you need additional monitoring infrastructure, oversight, and other processes like automated model retraining, to mitigate the effects of these drifts during deployment.

Types and examples of model drift:

Let's see an example from each category.

Data Drift:

This happens when the characteristics of the input data change, which means the properties of the independent variables have changed. The change in customer habits over time and the model’s inability to respond to change is an example. Here the underlying distributions of the features have changed over time. Change in feature values due to the pandemic is an example of data drift. This can happen due to many causes, such as seasonal behavior or change in the behavior of shopping during the lockdown.

Concept Drift:

This occurs when the link between the input variables and the target variables changes over time, which means the properties of our dependent variable have changed. Since the description of what we want to predict changes, the model provides inaccurate predictions. E.g., Spam mail is getting better every day and we need new boundaries to define what is spam. It was really easy to find out spam mail in the past because then, we only had to look at the spammy keywords(lottery, prize, lucky, etc.). Below is an example.

But nowadays, the spammers have become so smart that they are able to fabricate scenarios, where it’s hard for a machine learning model in a traditional setting, to separate spam. All in all, the definition of spam has changed over time. See the mail below. This is called concept drift. Here the relation between the features and the target labels has been shifted. In this case, a good-looking normal email is spam, and we need our model to understand this.

Upstream data changes:

This refers to operational data changes in the data pipeline. An example of this is when a feature is no longer being generated, resulting in missing values. Another example is a change in measurement (eg. miles to kilometers).

How do we track model drifts?

Now don’t get deceived by this name. Our model is not drifting anywhere. It’s the distribution of the data or the feature-target relationship, that is drifting.

There are three main approaches to detecting model drift:

⦁ Performance Monitoring

This is the most straightforward way for measuring model decay. We directly monitor the performance and set a lower threshold that should not be violated. For this approach we need:

Access to ground truth labels.
Some test functionality to compare the model output (prediction) to the ground truth and thus compute the model performance.

⦁ Monitor data-drift.

⦁ Detect changes in the distribution of the model input = features.

⦁ Detect changes in the distribution of the model output = labels.

⦁ Monitor concept-drift.

Detect changes in the way how decisions should be made. E.g.: In the case of a recommendation system: The preferences of the user change.

We now have seen some possibilities to detect model drifts. In real life, these can be used as triggers that retraining of the productive model is needed.

Model Retraining

Let’s finally dive into the topic of model retraining now that you know about model drift and the issues during deployment. Model retraining refers to updating a deployed machine-learning model with new data.

A few aspects have to be taken care of before model deployment. These practices will help us derive a robust model that doesn’t need to be retrained too often. These are:

⦁ Proper assembly of training data from multiple sources to avoid bias.

⦁ Proper feature engineering.

⦁ Comparing performances with different algorithms.

⦁ Good error estimation.

Now consider this point, the takeout from this article — Model Drift strictly refers to the degradation of predictive performance due to feature/target distributions changing. All in all, to tackle the issue of this drift you do not need to change anything in your code, but just the training data.

This is because when you try to change the code, say apply a new algorithm type or change the feature space, a completely new model is generated which needs to be tested again before deployment. In this case, A/B tests can be a good way to measure the impact of new models.

Dealing with old and new data.

Data is something that will be generated endlessly. We can have a continuous flow of new data but that doesn’t mean that our model has to be retrained every time it sees new data (like in the case of Online and Continual learning). We are only interested in those data points which are holding new relationships, which our model hasn’t seen yet. This way when we combine these new data points with the old ones, our model is better equipped at mapping relationships. There are different ways of combining new and old data which have been discussed in a further article, you can find the details here.

When to retrain?

As just mentioned, performance degradation is the main reason to perform a retraining process. The retraining can be started from a trigger (e.g., the click-through rate has gone below 1.91%), as well as from a schedule (e.g., every Monday at 2 a.m.).

Model performance is always expected to be optimal with the most recent data, but the more often retraining happens, the higher the cost. You can define the ideal schedule by running an offline experiment to derive the expected time it takes for data drift and concept drift to push the model performance below a baseline threshold. Data and model changes, as well as code updates, are another reason to kickstart a model retraining. We will dive deeper into this topic, in the next article.

So, this was all on model retraining. See you again in the next article.

Till then, happy learning 😊.

The Model Retraining Bible, part 1

Written by Hrithik Rai Saxena