BASIC XAI

BASIC XAI with DALEX— Part 1: Introduction

Anna Kozak
Oct 18, 2020 · 3 min read

Introduction to model exploration with code examples for R and Python

Image for post
Image for post
By Anna Kozak

Hello!

Welcome to “BASIC XAI with DALEX” series.

In this post, we will take a closer look at some algorithms used in explainable artificial intelligence. You will find here an introduction to methods of global and local model evaluation. Each description will include a technical introduction, example analysis, and code in R and Python.

So, shall we start?

First — why should I use XAI?

Nowadays, the quick and dirty approach to develop a predictive model is to try a large number of different ML algorithms and choose the single result that maximizes some validation criteria. This often results in complex models called black boxes. Why? Sometimes these elastic algorithms find models with greater predictive power, sometimes they can detect tricky relationships between variables, and sometimes all models are of similar performance but there are more complex ones so they are more often selected.

But there is a price to pay in this quick and dirty scheme. When we choose complex yet elastic models, we often lose the interpretability of them. To understand what decisions are made by the trained model, algorithms and tools are being developed to help human experts to understand how models are working. There is plenty of methods developed under the explainable artificial intelligence (XAI) umbrella that can be used to explain or explore complex models.

Second — which to choose: global vs local?

A growing number of tools for explanation are emerging because different stakeholders have different needs.

Global explanations are those that describe model behavior on the whole data set. This allows us to deduce how the model behaves generally/ usually/ on average.

Local explanations, on the other hand, refer to a single prediction, to a specific client/property/patient on which model operates. Usually, local explanations show which and how different variables contribute to the model prediction.

These differences are shown in the XAI pyramid below. The left part of the pyramid corresponds to the assessment of a single observation and the right part to the whole model. We can ask various questions about the model. On the left are questions related to a specific prediction. On the right are questions about the model in general.

From the top, we start with more general questions that can be answered with a single number or few numbers, like what is the predictive performance of the model (this can be summarised with a single number like AUC or RMSE), or a prediction value for a single observation (a single number). The following levels refer to the more and more specific methods, which we will discuss in this basic XAI series.

Image for post
Image for post
Biecek, P. and Burzykowski, T. Explanatory Model Analysis

Third — let’s get a model in R and Python

In this example, we will use the apartments dataset (collected in Warsaw, available in DALEX package in R and Python). The data set describes 1000 apartments with six variables such as surface, floor, no.rooms, construction.year, m2.price, and district. We will create a model that predicts the price of an apartment, so let’s start with a black box regression model — random forest. The package that we will use in these examples is DALEX.

Below we have the code in Python and R, which allows us to transform the data, build a model, and explainer. The explainer is an object/adapter that wraps the model and creates a uniform structure and interface for operations.

Code to build model and explainer in Python and R

If you want it, you can use ready-made objects prepared by us, you can find here.

In the next part, we will learn about a method for global variable importance — Permutational Variable Importance.

Many thanks to Przemyslaw Biecek and Jakub Wiśniewski for their support on this blog.

If you are interested in other posts about explainable, fair, and responsible ML, follow #ResponsibleML on Medium.

In order to see more R related content visit https://www.r-bloggers.com

ResponsibleML

Having Fun while building Responsible ML models

Thanks to Przemyslaw Biecek

Anna Kozak

Written by

Data Scientist & Research Software Engineer. Interested in modeling, analysis, explainable artificial intelligence, and data visualization.

ResponsibleML

Tools for Explainable, Fair and Responsible ML.

Anna Kozak

Written by

Data Scientist & Research Software Engineer. Interested in modeling, analysis, explainable artificial intelligence, and data visualization.

ResponsibleML

Tools for Explainable, Fair and Responsible ML.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store