Model Drift
In this blog I bring out ideas around concept drift and how it manifests in a real world production setting for a regression model.
A Definition for Model drift:
A loss of machine learning model’s accuracy when subject to a continuum is defined as model drift. Every time we deploy a validated model to production, the assessment of metrics is a point-in-time data. There exists fair amount of reasoning to not hold on to that assessment for too long
What causes drift ?
#1 Armageddon(Out of Control events):
Lets say we deployed a housing/automotive valuations model and soon after deployment one of the below events occur:
1)A covid lock down stifling demand.
2) Timber shortage staring at pent up demand for housing.
3) Chip shortages causing car manufacturer’s to shut down plants.
4) A tsunami causing manufacturers to shut down factories.
This could result in the error (RMSE/MAE) measured in a post-deployment detection window beginning to trend. It is imperative that ML Ops strategy have a comprehensive plan to anticipate and mitigate the above events impacting our model accuracy. In several settings, it can be hard to account for features that may represent impacts from one of the above events explicitly. Notably, the above events stand out from seasonal events such as a Sunday shopping trend which are usually captured by sequential models.
#2 Temporal Features: Its possible we accounted for temporal features but failed to measure their impact. One reason could be as simple as having fewer representation in the training data to a temporally sensitive cohort( vacation rental in summer or convertibles in summer).
#3 Poorly automated training workflows: At the time of writing this blog, a number of firms still have poorly automated workflows in their training and scoring pipeline. This can cause increased lead time to deploy a model and motivate deployment at a reduced frequency. As messy as that sounds, it can happen in scenarios specifically where automation has several obstacles.
Why should a data scientist care about model drift ?
Drift ignorance can be very expensive .A thoroughly validated model can begin to perform poorly soon after deployment if drift considerations are ignored. If your product is consumer facing, prediction inaccuracy can be easily proven and can cost loss of reputation/revenue for the
business.
Approach to tackling drift:
Lets discuss a couple of scenarios that warrant contrasting approaches to handling drift.
1)A real time recommender system: You would want the model to evolve as quickly as possible, adequately weighting recent user behavior data
with historical data and show recommendations that convert the customer. Here a complete automated deployment, sequential models and learning models on the edge may all be formulated as cure to drift.
2) A Housing /E-commerce valuations model:
If deploying a new model entails monetary risk to your business, we need a careful mixture of automation and control. (i.e) Don’t go full blown automated model refresh.
Here are a couple of reasons as to why you shouldn’t go full automated refresh:
- A business stake holder needs to understand impact , metrics and sign off — There is no real timing here.
- Latency to absorb your model’s impact: There will be businesses who absorb your valuation and may be create campaigns based on your predictions. For a valuation model, realtors, dealers etc could involve adjusting prices based on what your model has scored. This takes time and an ML engineer should allow & explicitly control a stability window during which your scored values are static.
How to detect drift ?
Step1: Formulate a definition of the drift signal for unseen data that works for your specific problem
For the above problem of measuring drift in predicted prices we can evaluate difference in average list prices from predicted prices over a large cohort for new listings on a weekly basis. The widening shaded region’s area represents drift change over time after deployment
Step2: Let’s take the drift signal defined in the above step and define patterns for confirming /rejecting presence of drift
1) Check for Anomaly : The input signal may have anomalies and may be needed to be checked for ongoing incidence of anomalies in different cohorts. There are several approaches to detect anomalies in temporal signals ranging from Bollinger bands, applying confidence interval checks and clustering approaches ( DBSCAN) which we will not be going into detail here
2) Check for Stationarity: Here is an informal definition to stationarity. If your signal does not periodically rivet back to the mean, there is a trend in your drift signal (upward) and you definitely have reason to worry about loss of accuracy. There are statistical tests that can be run on an input drift signal to detect presence of stationarity and they warrant alerting and action.
Lets make a segue and go one step deeper to reinforce the notion of stationarity. The below two images tell the following story: An input “Non-stationary Signal” has been stationarized (de-trended) and the residual signal is a “Stationarized Signal”
The above images are an attempt to build a partial intuition about stationarity
Take Action on Drift:
- Redeploy your model: Have a CI/CD scoring pipeline for maximally automating your deployment and reduce the time to deploy your model.
- Deterministic noise: Sometimes it can be too late to act by deploying a model if your model’s output has a lot of consumer reach. We can infuse deterministic noise and smoothen the model’s predictions on specific cohorts where drift is significant
The above are some critical considerations for monitoring and managing drift while productionizing an ML model