Explainability and Trust in Machine Learning

Published in

IBM Data Science in Practice

5 min readNov 1, 2022

The importance of explainable AI and data science

Machine learning is often considered a black box approach to complex use cases. For users and stakeholder to have buy-in for artificial intelligence and machine learning, there needs to be a level of:

· Trust

· Auditability

· Confidence

I will show that many machine learning algorithms offer explainability and can be used to help better understand your data. I will use Kaggle’s Ames Housing dataset which has sale price and features related to each sale. The results shown give you an intuitive understanding of the data from a machine learning approach rather than a statistical approach. You can see the model training and visualizations in this repository.

The Dataset

The Ames dataset has 81 features and 2930 rows, each relating to housing sales in Ames, Iowa sold between 2006 and 2010. I’ve whittled the data down to the most basic features, leaving 2 categorical features, 6 boolean features, and 13 numeric features. Those features include the number of rooms, the age of the house, the style of the house, and its overall quality. Below are the features and how they correlate with sale price. The darker red a feature is, the more likely an increase in that feature will also correlate with an increase in the sale price. The darker blue a feature is, the more likely an increase in that feature will correlate with a decrease in sales price. For example, overall quality and number of bathrooms correlates positively with sale price, while the years since the last remodel correlates negatively with sale price.

Trust

The best way to trust in your models is by looking at the results. If you have an exceptional f1 score and you’re seeing real world results, you’re in a good place. With a large enough dataset, deep learning often produces these results. But how do we know WHY your model is making the predictions that it is? One of the best ways to do this is with the SHAP (SHapley Additive exPlanations) approach. The SHAP package can tell you how incremental changes in your model data affect the predictions. Before using SHAP, I trained a basic deep learning regression model to predict sale price. In the example below, we can see each feature and the impact it had on the result. The factors with the largest impact were the fact that there is no fireplace in this house, and that it was remodeled only 4 years ago. Without this insight, we would only have the prediction ($181, 209) with no justification of why that prediction was given.

Furthermore, we can visualize the impact of each variable on the prediction each row. Here I’ve sorted by the prediction value, but each gap below is the impact of a different variable. It’s important to note that these values aren’t causal, only that those features have these incremental impacts on the prediction.

Auditable

For most use cases, decision trees are too rudimentary. Yet, when you need to be able to see the exact “thought process” that led to a recommendation, decision, or prediction, decision trees allow a clear audit path to a decision. While many algorithms, like deep learning, will provide a somewhat obscure model, a decision tree gives you the exact business rules that lead to a leaf node and the number of samples that ended up in that node.

Let’s zoom in on one of the leaf nodes to see how we came to the price decision. In this case, we see that there are 51 houses that have overall quality of 8, 7 or less total rooms, and no fireplace. For those houses, the average sale price is $218,470. This method gives an exact account and justification of the prediction. 51 houses with these features matching sold at an average of $218,470. It’s as easy as that.

Confidence

Finally, what are some ways that we can increase our confidence in our models? One of the best ways to do this is by understanding our models and validating that they match our intuition. I have two examples of this: One through linear regression coefficients and one through random forest feature importance.

Another of the most basic stats model is a linear regression. For each feature, we can calculate a value that tells us how the target changes as this feature changes. We can use that coefficient to test our intuition. In this example, we see that overall quality has the highest impact on price, followed by total rooms. Both have a positive relationship (as quality goes up, sale price goes up).

Something strange you might notice is that bedrooms have a negative relationship with sale price. Remember that when we use linear regression, each other feature is also considered. So, in this case, bathrooms add much more to a sale price than bedrooms, but both have a positive effect because the value subtracted of a bedroom (Bedroom AbvGr) is much less than the value added from any room (TotRms AbvGr).

One last model which produces some data understanding is random forest. After we train a random forest model, we can access the feature importance of each feature. This is calculated by the number of times that feature is part of a tree which makes an accurate prediction. Here, again, we can confirm our intuition. We can see that house age, years since last remodel, and number of rooms are the most important features when predicting sale price. If you end up with an important feature that doesn’t match your intuition, that’s a good indicator that you can revisit the exploratory data analysis step.

IBM uses a diverse set of machine learning algorithms in our research and for our clients. We want to show that there is a level of explainability in machine learning in addition to predictive power. In this case, I’ve trained many sale price prediction models based on house feature data to show that not only can we make predictions, but we can understand our data through visualizing decision trees, feature importance, coefficient comparison, SHapley Additive exPlanations, and more.

Explainability and Trust in Machine Learning

Written by Ross Lewis