How to use the Arena for exploration of ML models for credit scoring
To develop predictive models responsibly we need good methods and tools to explore and examine predictive models. This leads to a growing interest in eXaplainble Artificial Intelligence (XAI) of Interpretable Machine Learning (IML) techniques. More and more libraries for Explanatory Model Analysis (EMA) are available in the R and Python packages (see shap, lime, dalex). However, using them requires writing code, which in turn slows down feedback and reduces the effectiveness of the exploration process.
Arena dashboard
Arena is THE tool for interactive exploration of predictive machine learning models. Browser based user’s interface allows to explore the models both on desktop, smartphone and iPad. The Arena provides a workspace that can be filled with different views showing models from different perspectives (see the Rashomon perspective). In the blog we will show you how to use this tool for a random forest model. We will use the trained model and see how XAI techniques allow to analyze the behavior of this model from the perspective of one instance — Victoria
Arena provides three observation level XAI charts:
- Break Down — shows contributions of every variable to the final prediction
- Shapley Values — averaged version of Break Down for different variables order
- Ceteris Paribus — answers the question how prediction changes if only one variable would be changed
Read more about above methods here.
Let’s meet Victoria.
She is a 32-year-old woman with a spouse, her own house, and quite a fine income. Yet Victoria is not able to get a credit. We will use Arena to help her.
Overview
To succeed, she needs prediction above the cutoff level (default 0.5). As you can see on both charts, the leading cause of disapproval is the police record. Sadly, it’s something we cannot change.
What-if analysis
By clicking on Break Down contribution blocks we can explore Ceteris Paribus charts for the selected variable. We will check variables, that she is able to improve.
I already annotated (red color) points, where only one change guarantees exceeding the cutoff:
- shorten loan time from 36 to 30 months
- borrow only 750 instead of 950
- rent a house
At the moment it is strange, that renting a house will improve credit score, but that is the example why in the real world data scientists need to understand their models.
But Victoria may improve her situation! She should for the sake of taking credit separate with her husband and move to her parents. If she then shortens loan time by 3 months her credit will be approved.
About this example
I used the CreditScoring dataset with an artificial person’s names to create a random forest model. The code is available here.