The world is rapidly adopting more and more machine learning solutions in ever more sectors, including some heavily regulated ones. At the same time, fear of using algorithms in settings where they could potentially have harmful effects is increasing, and regulators have started taking a closer look at how companies and governments are applying machine learning solutions and whether they could harm their users.
A particular concern is often that these algorithms present ‘black box’ solutions that make it impossible for anyone to understand how they really work and how predictions are generated. However in recent years, a number of new explainable AI methodologies such as SHAP (Lundberg et al, 2017) have allowed us to explore and better understand the inner workings of these algorithms and their individual predictions, so these models are not really black boxes anymore.
Still, the technical know-how and manual input needed in order to generate these explanations form a barrier to making these explanations accessible to not only data scientists, but also to other stakeholders, management, staff, external regulators and, finally, customers.
Management and other stakeholders ultimately need to decide whether to go forward with a proposed machine learning solution. To do so, they need to feel confident about what is in the model and how predictions are generated. External regulators need to be able to assess whether the model violates any local or European regulations, such as the GDPR. With the GDPR, consumers have a right to an explanation whenever an automated solution affects them in some material way. Finally, many real-world machine learning deployments involve human-in-the-loop solutions, where a human decision-maker can choose to overrule the algorithm. In such cases, this person must understand how the model works in order to judge when it should be overruled.
With the explainerdashboard library that I developed in the past year it is easy to build rich interactive dashboards that allow even non-technical stakeholders to explore the workings and predictions of machine learning models.
With the introduction of SHapley Additive exPlanations (SHAP) (Lundberg et al 2017;2020) it became possible to answer the following question for each feature input for each prediction of almost any machine learning algorithm: “What was the contribution of this feature to the final prediction of the model?” This allows us to calculate the contribution of each feature (column) to the final predictions for each row of data (e.g. a specific customer). The contributions of each feature are guaranteed to add up to the final prediction.
Even though the implementation details differ for the various algorithms (e.g. scikit-learn RandomForests, xgboost, lightgbm, PyTorch, TensorFlow, etc), with the shap library, generating SHAP values is now really straightforward for a data scientist:
In this example we used the canonical kaggle Titanic dataset to predict the likelihood of particular passengers surviving the sinking of the Titanic. So we can now see what predicts the survival of the first passenger in our test set. In this case, the fact that the passenger was a male with a 3rd class ticket, and that the ticket fare was low, seem to have contributed to a low expected chance of survival:
We can also look at the general effect of a feature such as ‘Age’ on predictions in our data set. Here we see that older people generally were predicted to have a lower chance of surviving:
Although the shap library is awesome, and the above implementation is extremely straightforward for an experienced data scientist, it is easy to forget that for most stakeholders inside or outside a company the above will seem like complete black magic. Where do you even type those magic incantations??
Your manager may not know how to cast Python spells, but is probably able to use a browser, so what you can do instead is use the explainerdashboard library to start a model explainer dashboard:
This will start a web server that hosts an interactive dashboard that allows you to perform all the investigations of the shap library and more, simply by clicking around. All you need to do is point your manager to the right URL (e.g.
http://123.456.789.1:8050 or wherever you are hosting your dashboard).
The dashboard includes a number of tabs showing different aspects of the model under investigation, such as feature importances, model performance metrics, individual predictions, what if analysis, feature dependence and interactions, and individual decision trees:
You can find some deployed interactive examples of dashboards here.
The first question you would want to answer about any algorithm: what are the input features and how important are they to the model? With SHAP values we can calculate the mean absolute SHAP value for each feature, meaning the average impact (positive or negative) that knowing that feature has on the predictions. Another way of measuring importance is through permutation importance: how much worse does the model metric get when you intentionally shuffle the values of a particular column? The more important the feature the bigger an impact shuffling it should have on model performance.
The next question a stakeholder would of course like to know: so how good is the model anyway? As data scientists, we know that there are many ways of measuring the performance of an algorithm, and many ways of displaying that performance. So in the dashboard, we are a bit promiscuous and simply display almost all of them. (You can optionally hide any component as we will see later)
For example for the above classification model, you can see model metrics, a confusion matrix, precision plot, classification plot, ROC AUC curve, PR AUC curve, lift curve, and a cumulative precision curve. And you can adjust the cutoff value for each plot:
And for regression models you can see the goodness-of-fit, inspect residuals, and plot predictions and residuals against particular features:
Individual predictions + ‘What if’ analysis
With GDPR, customers have a ‘right to an explanation’ whenever an automated algorithm makes decisions that materially affect them. It thus becomes very important to be able to explain individual predictions that your model has produced. By selecting an index you get all the contributions both in waterfall plot form as well as in table form.
You can also edit the feature input on the ‘What if…’ tab to investigate how small changes in the input would change the output:
Besides being able to explain individual predictions, you also want to be able to explain in broad contours how feature values impact predictions of your model. For example: does the predicted likelihood of survival go up or down with age? Is it higher for men than for women? This is called the feature dependence of the model, and it can give a lot of insight how the model uses specific features to generate its predictions.
You can go one step further however and investigate feature interactions as well. You can decompose the contribution of a feature into both a direct effect and indirect effects. The direct effect is simply “how does knowing the value of this feature directly change the prediction?”. The indirect effects occur when knowing a feature changes how you interpret the value of another feature. So for example, knowing that a passenger was male may have both a direct effect (lower chances of survival for males), but also indirect effects (relatively higher chance for a male in third class). The total contribution is the sum of direct and indirect effects. Looking at the indirect effects (interactions) can often give you valuable insights into the underlying patterns in your data that your model managed to discover and make use of.
Many of the most used algorithms in applied machine learning are based on simple decision trees. Visualizing these decision trees and showing how they are aggregated to a final prediction can decrease the fear of the unknown that sometimes comes with such models. Although these algorithms have often been called ‘black-boxes’, all they are is just a collection of simple decision trees!
For example in the case of Random Forests, the algorithm simply takes the average prediction of a number of decision trees:
And you can visualize the individual decision trees using the awesome dtreeviz package right inside the dashboard:
Low-code customizable dashboards
As you can see the default dashboard is quite fully featured. Maybe it is even a bit too fully featured for some purposes. In order not to overwhelm your stakeholders, it probably makes sense to switch off some or most of the bells and whistles. Luckily due to the modular nature of the library, this is quite straightforward. You can switch off and hide any tab, component, or toggle from the dashboard by simply passing the appropriate flags to the dashboard.
So for example we can switch off the ‘What if…’ tab, hide the confusion matrix on the model performance tab, and hide the cutoff sliders:
You can even go further and fully design your own dashboard layout. By re-using the various components in the dashboard you only have to design a layout using dash bootstrap components (dbc), and
ExplainerDashboard will figure out the rest:
This gives you the flexibility to show exactly those components that you find most useful, and surround them with further explanatory context.
Hosting multiple dashboards in ExplainerHub
If you have several models in production for which you want to have explainer dashboards up and running it is convenient to host them in a single place. You can do that by passing multiple dashboards to an
Algorithmic breakthroughs in making models explainable are not in themselves enough. What is needed are implementations that make the usage and functioning of models transparent and understandable to stakeholders at all levels.
Being able to quickly build and deploy an interactive dashboard that shows the workings of machine learning models is an important step towards this goal.
Lundberg, S. M., & Lee, S. I. (2017, December). A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (pp. 4768–4777).
Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., … & Lee, S. I. (2020). From local explanations to global understanding with explainable AI for trees. Nature machine intelligence, 2(1), 56–67.