Backstage: The Hitchhiker’s Guide to Responsible Machine Learning

Przemyslaw Biecek
Nov 23 · 6 min read

A month ago we released an educational comic in the area of Responsible Machine Learning/Explanatory Model Analysis titled ,,The Hitchhiker’s Guide to Responsible Machine Learning’’. In 52 pages we present methods, tools and good practices for building and verifying predictive models using the example of covid mortality analysis. Although the book itself is not long and can be read in two hours, the idea for this comic has matured over the years.

Below I will share some of my thoughts on the creation of this (comic) book. English language version is available online at https://betaandbit.github.io/RML/. The Polish translation will appear online and in bookstores in January 2022.

The cover of the RML comic book. In the following pages, together with the two main characters Beta and Bit, we will go through several iterations of building a predictive model.

The Experience Economy

Time to action

But what will happen next? Is it possible to plan an experience in which the participant not only listens about the story but also takes part in it? The RPG game industry has shown that it is possible. Not only can we read about the adventures of Geralt the Witcher by reading books about him, we can also experience some of these adventures in a computer game. In classic books, exercises are the trigger for such experiences. Not only do we read about facts, but by doing the exercises we can experience and understand more deeply the issue being discussed.

We decided on a similar solution in the comic book “The Hitchhiker’s Guide…”. The excerpts of the discussed story are equipped with sample codes and data that can be executed in the R console (and in the future also in Python). This way we don’t have to passively watch the adventures on the pages of the comic, but can look at the data ourselves and try to use a different model, or apply a different model validation technique.

Tree training example from page 32. The user can change the training parameters, such as depth, to see what kind of model they will get for other settings.

In fact, based on these examples we run a whole hands-on workshop at UseR 2021 conference. For 3 hours the participants went through the same adventures as Beta and Bit, building, verifying and deploying a predictive model.

The Responsible Machine Learning workshop at UseR 2021 is just starting. I gave it together with Anna Kozak, Hubert Baniecki and Kuba Wisniewski. https://tinyurl.com/RML2021

The Process of Explanatory Model Analysis

In the RML comic we try to disenchant this myth in several ways. First, we show four iterations of building a model, with each iteration producing an increasingly complex but also effective model. Second, the first model is created before we have access to new data. Often, especially in medicine, a huge amount of domain knowledge is already available to build the first iteration of the model without any raw data. Third, in the modelling process in each step we learn something new about the problem being analysed and this new knowledge can be used in the next modelling step.

Still have time? Let’s try yet another approach!

Due to its limited size, the comic does not go very deeply into the mathematical details of the individual methods. On even-numbered pages, the intuitions behind each technique are presented. But the whole is based on the textbook “Explanatory Model Analysis”, where you can read in detail how the different methods work and what are their advantages and disadvantages.

The methods we show are often referred to as Interpretable Machine Learning or Explainable Artificial Intelligence. Both of these names are, however, in some sense wrong. Not every model is actually interpretable, and our goal is not to interpret the model nor interpret the prediction. Similarly, the term explainable causes discussion of XAI methods to shift too often to the psychological basis of explainability. Although in reality we rarely want the model to actually explain anything on the same basis as a teacher explains to students or a parent explains to children. Very often our expectation is to justify the model’s prediction so we can question them. Properly naming things helps in understanding them, so in the comic we try to consistently use the term Explanatory Model Analysis to emphasise that we are talking about model understanding, just as Explanatory Data Analysis is about data understanding.

Conceptually, this comic is based on methods and tools from the book Explanatory Model Analysis https://ema.drwhy.ai/

The Story

As it turned out, the model that emerged had many more stakeholders than we expected, because not only were epidemiological services interested in it, but also many outsiders were curious or concerned about the complications and possible death in case of infection.

The covid predictive model is available at https://crs19.pl/

We decided to make the model itself publicly available at https://crs19.pl/. We were surprised to find that it was noticed by major media in Poland and Germany. And if so, what better way to show EMA in action than to base it on a real case where these techniques were used in response to a real need?

In the comic we used the characters Beta and Bit, which I had previously created for data literacy books (most of them are available only in Polish). She is fascinated by maths and statistics, he is a programmer experimenting with machine learning, together they are a great team to show the value of predictive modeling.

There was not enough space in the comic to show all the models we tested during the actual modelling. In particular the boosting model with monotonicity constraints or the logistic regression model with cubic splines, which also gave very promising results. Well, maybe one day there will be a Part II describing other modelling techniques as well.

Concept notes outlining the structure of the entire comic.
First realisation of the comic, we check if the story wraps up

Let’s go somewhere.

ResponsibleML

Having Fun while building Responsible ML models