How SHAPY are your features?

Gauridargar
2 min readSep 17, 2023

--

Hello fellow Readers,

I am Gauri Dargar, a Data Alchemist and a self-taught learner. In my current project, I’ve been extensively researching the quest for the perfect model, although we all know there isn’t one. However, it seems like our managers will keep us on the hunt indefinitely.

One fine day, while navigating the vast online seas, I stumbled upon something truly intriguing. Have you ever wondered how each prediction generated by a model is influenced by individual features?

Enter SHAP, or Shapely Additive Explanations, a powerful Python package. In our world of Machine Learning models, SHAP analyzes the importance of each feature and how a specific feature impacts a prediction. More precisely, it reveals how each feature contributes to increasing or decreasing a prediction. Introduced by Lundberg and Lee in 2017, SHAP has become an invaluable tool for Data Scientists, providing much-needed interpretability regarding our models.

Wondering how to use SHAP?

To demonstrate how to implement SHAP, I’ve provided a link to my Kaggle Notebook where I’ve employed SHAP to gain insights into a model’s interpretability. Feel free to explore it and, if you find it informative, consider giving it an upvote.

Now, let’s dive into how SHAP operates.

SHAP calculates the Shapley Values for each feature. These features are the additional pieces of information given to the model to predict the target variable. The term ‘Shapley values’ originates from Game Theory, where they were formulated as a fair method to distribute the value of a game among its players. Similarly, SHAP divides a model’s prediction among its features.

The Shapley Values for each feature represent the contributions between the predictions and the average prediction value. They reveal how each feature has influenced the prediction compared to the average prediction. One Shapley value is computed for each feature value, and this value can change when examining another instance. SHAP empowers us to visualize this information through insightful graphics like Waterfall plots, Force plots, Swarm plots, and more.

But why bother with all of this? Do we really need to understand how the model makes its predictions?

Well, SHAP offers some compelling advantages:

  1. Debugging: It allows us to closely scrutinize incorrect predictions and identify which features caused the errors. It also helps uncover cases where the model performed well on a dataset but poorly on new data.
  2. Explanations: SHAP provides information about the impact and importance of each feature in predictions and the relationships captured by the model.
  3. Data Exploration: When interpreting the model’s predictions, we often discover entirely new relationships.
Summary plot for all the features in the dataset

--

--

Gauridargar

🔍 Data Scientist | Machine Learning Enthusiast | Analytics Aficionado 📊