Model Behavioural Insights using IBM Watson OpenScale

Manish Bhide
Trusted AI
Published in
4 min readJun 14, 2019

Employee feedback is a very powerful tool which helps employees improve by understanding their strengths and weaknesses. It is said that employees crave feedback as it helps them grow professionally. AI models have similar characteristics — they need feedback to help them identify their strengths & weaknesses and get better. IBM Watson OpenScale helps enterprises understand the behaviour of their AI models and derive hidden insights using Payload Analytics. Thus OpenScale helps enterprises identify the strengths and weakness of their AI Models which can then be used to improve them. In this blog post, we provide an overview of this capability of OpenScale. Before we get into the details, lets start with some primitives.

What is Model Payload?

As soon as OpenScale is configured to monitor an AI model, it starts collecting the input received by the model and its prediction. This information is called as the Payload data and represents the history of the model behaviour over a period of time. E.g., consider a model used by a bank which accepts in input the details of a loan application (such as Loan Duration, Credit History, Loan Purpose, Loan Amount, Existing Saving, etc.) and predicts if the application is Risky or Non Risky. The payload data in this case will contain the information of all the loan applications received by the model and the models’ decision for each of these application. Each model prediction is typically associated with a confidence of the model in that prediction. This information is also stored in the payload.

Such payload data is very valuable to the business as it provides a complete record of the model behaviour over a period of time. E.g., if a customer who had applied for a loan six months ago comes to the bank and questions why her loan application was rejected, then the the bank can use OpenScale to look at the historical record from the payload data and use the Model Explanability feature to generate an explanation on why the loan was rejected by the model.

Payload Analytics

IBM Watson OpenScale also provides a capability to derive model behavioural insights by analysing the payload data. It includes a Chart Builder which can be used to build different kinds of charts to derive insights. One such kind of chart that can be generated using the Chart Builder is that of plotting the distribution of the model prediction as is shown in Figure 1.

Figure 1: Model Prediction Distribution

Consider a bank which historically receives around 40% risky loan applications which it rejects. If such a bank were to see the above chart, they will realise that the model is flagging a lot less percentage of loan applications as risky. They would immediately want to take corrective action. In order to take corrective action, they will want to identify the loan applications where the model is likely to make incorrect predictions.

As mentioned earlier, each prediction is typically associated with a confidence of the model in that prediction. If the model confidence is low, it is likely to make an erroneous prediction. OpenScale provides a chart which shows the correlation between the model confidence and its predictions as shown in Figure 2.

Figure 2: Confidence vs. Model Predictions

The above figure shows that there are many loan applications where the model confidence was between 45 to 59 % (which is on the lower side). The bank will be very keen to analyse this data and try to understand why the model confidence is low. OpenScale helps address this problem by helping plot a feature value with Confidence as is shown in Figure 3.

Figure 3: Feature vs. Confidence

Comparing a feature with the confidence helps the bank to uncover insights about how the feature is impacting the confidence. Figure 3 shows one such correlation between the feature CheckingStatus and the confidence. As one can observe, the confidence of the model drops when CheckingStatus has a value of “no_checking”. This information can be used by the bank to improve the model by extracting all the records from the payload table where CheckingStatus is “no_checking” and sending them for manual labelling and retraining the model with this additional manually labelled data. It could also happen that “no_checking” value represents a data quality problem in which case it would have to be fixed by the data steward.

Thus the above example shows how the Payload Analytics feature of OpenScale can help enterprises identify a problem, get to the root cause and improve the model thereby resolving the issue. In summary, Payload Analytics is a very useful feature of OpenScale that can be used to understand model behaviour and if required, take corrective action.

--

--