How to use the Arena for exploration of ML models for credit scoring

Published in

ResponsibleML

3 min readSep 7, 2020

To develop predictive models responsibly we need good methods and tools to explore and examine predictive models. This leads to a growing interest in eXaplainble Artificial Intelligence (XAI) of Interpretable Machine Learning (IML) techniques. More and more libraries for Explanatory Model Analysis (EMA) are available in the R and Python packages (see shap, lime, dalex). However, using them requires writing code, which in turn slows down feedback and reduces the effectiveness of the exploration process.

Arena dashboard

Arena is THE tool for interactive exploration of predictive machine learning models. Browser based user’s interface allows to explore the models both on desktop, smartphone and iPad. The Arena provides a workspace that can be filled with different views showing models from different perspectives (see the Rashomon perspective). In the blog we will show you how to use this tool for a random forest model. We will use the trained model and see how XAI techniques allow to analyze the behavior of this model from the perspective of one instance — Victoria

Arena provides three observation level XAI charts:

Break Down — shows contributions of every variable to the final prediction
Shapley Values — averaged version of Break Down for different variables order
Ceteris Paribus — answers the question how prediction changes if only one variable would be changed

Let’s meet Victoria.

She is a 32-year-old woman with a spouse, her own house, and quite a fine income. Yet Victoria is not able to get a credit. We will use Arena to help her.

Overview

Break Down and Shapley Values for Victoria

To succeed, she needs prediction above the cutoff level (default 0.5). As you can see on both charts, the leading cause of disapproval is the police record. Sadly, it’s something we cannot change.

What-if analysis

By clicking on Break Down contribution blocks we can explore Ceteris Paribus charts for the selected variable. We will check variables, that she is able to improve.

Multiple Ceteris Paribus profiles for Victoria

I already annotated (red color) points, where only one change guarantees exceeding the cutoff:

shorten loan time from 36 to 30 months
borrow only 750 instead of 950
rent a house

At the moment it is strange, that renting a house will improve credit score, but that is the example why in the real world data scientists need to understand their models.

But Victoria may improve her situation! She should for the sake of taking credit separate with her husband and move to her parents. If she then shortens loan time by 3 months her credit will be approved.

Break Down and Ceteris Paribus for modified Victoria’s observation

About this example

I used the CreditScoring dataset with an artificial person’s names to create a random forest model. The code is available here.