Understanding Model Drift with IBM Watson OpenScale

Manish Bhide
Trusted AI
Published in
6 min readNov 4, 2019

Airline pilots have the big responsibility of flying passengers safely from one place to another. In order to ensure that they are fit to discharge their duties, they have to undergo multiple tests at periodic intervals. E.g., medical assessment to ensure they are fit to fly, Line Check to test their flying skills, simulator practice for emergency procedures, etc. A similar need exists for AI Models. With the widespread adoption of AI in the enterprise, AI models carry the responsibility of ensuring correct business decisions. E.g., a loan processing model makes a decision on who should get a loan whereas a marketing campaign model decides who should be targeted for a marketing campaign. Before these models are deployed to production, enterprises ensure that they have good accuracy and perform well on the test data.

However, as with pilots, it is important to perform periodic checks on AI models to ensure that they are continuing to perform at the expected level. One way to do that is to generate new manually labelled data at periodic intervals and use that to test the model accuracy. However, generating manually labelled data is an expensive and time consuming process. Hence it is difficult to generate such data and even if such data were to be available, the amount of manually labelled data is always limited and does not cover all kinds of data which the model receives in production. Therefore, testing accuracy using limited manually labelled data does not guarantee that the model is performing accurately on the diverse data received by the model in production.

In order to address the above problem, IBM Watson OpenScale has introduced a new feature for Drift Detection. It helps enterprises to (a) monitor the accuracy of their models in production without having to generate manually labelled data and (b) identify a change in the characteristics of the data received by the model at runtime as compared to the training data. Poor model accuracy can often be attributed to model input that differs from the original training data or input which causes the model accuracy to drop. The Drift Detection capability in IBM Watson OpenScale monitors the data received by the model in production and (a) estimates the accuracy of the model (accuracy drift) and (b) checks if the data is very different than the models’ training data (data drift). Accuracy Drift is very helpful to enterprises as it helps them to immediately react to a drop in model accuracy before it has any significant impact on the business outcomes. E.g., In case of the loan processing model it will ensure that the business does not end up giving loans to a wrong set of customers thereby avoiding bad loans. Data drift on the other hand helps enterprises understand the change in data characteristics at runtime which might point to a change in the business environment. E.g., In case of the loan processing model the training data might have had very few applicants with age < 25. Due to a marketing campaign run to attract younger customers, a lot of young people might be applying for a loan. This will trigger a data drift alert which will point to the fact that the model might not be trained to handled such kind of customers. Hence the business would want to retrain the model to ensure that it makes the right decision for people with age < 25.

Configuring Drift Detection in OpenScale

When the customer configures Drift in OpenScale, they have to specify the tolerable accuracy drift magnitude. The drift is measured as the drop in accuracy as compared to the model accuracy at training time. E.g., if the model accuracy at training time was 90% and at runtime the estimated accuracy of the model is 80%, then the model is said to have drifted by 10%. Depending on the use case, model owners will be willing to tolerate different amounts of drift. Hence IBM Watson OpenScale allows the user to specify the accuracy drift magnitude (called as Drift alert threshold) for each model being monitored in OpenScale. If the drift for a model drops below the specified threshold, then OpenScale will generate an alert for the user.

How does Drift detection work?

In order to identify the accuracy drift, OpenScale needs to understand the behaviour of the customers’ model on the training and test data. It analyses the customer model behaviour and builds its own model (called Drift Detection Model) which predicts if the customers’ model is going to generate an accurate prediction for a given data point. OpenScale runs the Drift detection model on the payload data (data received by the model at runtime). Thus we now know the number of records in the payload for which the customers’ model is likely to have made an error in prediction. We make use of this information to generate the predicted accuracy of the customer’s model on the payload data. This value is compared with the accuracy of the model at training time to identify the model accuracy drift.

IBM Watson OpenScale identifies data drift by analysing the training data and extracting some characteristics of the data. It then compares the data received by the model at periodic intervals with the training data characteristics to identify data drift.

Reporting Drift

Figure 1: Drift in IBM Watson OpenScale

The Drift detection GUI is shown in Figure 1. It has two curves: the blue line represents the drop in model accuracy as compared to training time. The green line represents the percentage of data which has data drift. In the above Figure, the accuracy drift (at the extreme right point) is 20% which is 10% below the threshold. The model accuracy drift is shown as a range in the GUI with a thick center line. The center line represents the actual predicted accuracy. However, finding the exact predicted accuracy is a very hard problem. Hence the GUI also reports a range in which the drift is likely to lie. The model is said to be drifted if the predicted accuracy (center of the band) crosses the threshold. The data drift in the above figure is reported as 10%. This implies that 10% of the data received by the model is different than the models training data.

Figure 2: Drift Drill Down

Once the drift has been identified, the user can drill down at a specific point in time to understand the reasons for the accuracy and data drift as shown in Figure 2. In the above example, OpenScale is showing three groups of transactions which have drifted using a Venn diagram. The first group represents the transactions which have contributed to the accuracy drift, the second group represents transactions which have data drift and the third group has transactions which have both accuracy and data drift. Clicking on one of the group allows the user to understand a summary of the transactions belonging to that group.

Figure 2 shows the state when the user has clicked on transactions which led to accuracy drift. These transactions are grouped based on the features which played a key role in the drift. E.g., the first box has two features Profession and State. This signifies that the values of these two features played a key role in OpenScale believing that the model output is incorrect. Please note that each box represents multiple transactions received by the model. The user can click on each box to see the list of transactions represented by that box.

Customers can make use of these transactions and send them for manual labelling. This manually labelled data can then be used to retrain the model so that the model accuracy does not drop at runtime. Thus IBM Watson OpenScale not only helps to identify the drift but also helps the user to understand the root cause of the drift and provides drifted data which can be used to fix the model drift.

Summary

Thus in summary, model accuracy and data monitoring are very critical to ensure that the model is continuing to perform at the expected levels once it is deployed to production. The Accuracy Drift detection capability in IBM Watson OpenScale helps enterprises monitor their models’ accuracy at runtime without having to generate manually labelled data. On the other hand, Data Drift detection capability helps enterprises understand the change in data characteristics at runtime. Thus the accuracy and data drift ensures that enterprises can immediately react to a change in model accuracy and business environment before it has any significant impact on the business outcomes.

--

--