Feature Importance and Visualization of Tree Models

Chinmay Gaikwad
ChiGa
Published in
3 min readOct 2, 2021

In the previous article, I illustrated how to built a simple Decision Tree and visualize it using Python. This would be the continuation of the first part, so in case you haven’t checked it out please tick here.

Previously, we built a decision tree to understand whether a particular customer would churn or not from a telecom operator. We used Graphviz to describe the tree’s decision rules to determine potential customer churns. It is by far the simplest tool to visualize tree models.

Notice how the shade of the nodes gets darker as the Gini decreases. Lighter shade nodes have higher Gini impurity than the darker ones. Also, the class labels have different colors. Here, Blue refers to ‘Not Churn’ where Orange refers to customer ‘Churn’.

Tree Visualization

Although Graphviz is quite convenient, there is also a tool called dtreeviz. It’s a python library for decision tree visualization and model interpretation. dtreeviz currently supports popular frameworks like scikit-learn, XGBoost, Spark MLlib, and LightGBM.

First, we need to install dtreeviz. This can be done both via conda or pip. A detailed instructions on the installation can be found here.

pip install dtreeviz

Next, let’s import dtreeviz to the jypyter notebook.

We have built a decision tree with max_depth3 levels for easier interpretation.

Yay! dtreeviz plots the tree model with intuitive set of plots based on the features. It make easier to understand how decision tree decided to split the samples using the significant features. From the above plot we can clearly see that, the nodes to the left have class majorly who have not churned and to the right most of the samples belong to churn.

We can even highlight the prediction path if we want to quickly check how tree is deciding a particular class. For this to accomplish we need to pass an argument that gives feature values of the observation and highlights features which are used by tree to traverse path. However, more details on prediction path can be found here .

Feature Importance

Feature importance refers to technique that assigns a score to features based on how significant they are at predicting a target variable. The scores are calculated on the weighted Gini indices. Easy way to obtain the scores is by using the feature_importances_ attribute from the trained tree model. To know more about implementation in sci-kit please refer a illustrative blog post here.

Let’s see which features in the dataset are most important in term of predicting whether a customer would Churn or not.

Let’s structure this information by turning it into a DataFrame.

Now that we have features and their significance numbers we can easily visualize them with Matplotlib or Seaborn.

We can see that, Contract is an important factor on deciding whether a customer would exit the service or not. Also, OnlineSecurity , TenurePeriod and InternetService seem to have influence on customers service continuation. On the other side, TechSupport , Dependents , and SeniorCitizen seem to have less importance for the customers to choose a telecom operator according to the given dataset.

Hussh, but that took couple of steps right?. Do you want to do this even more concisely? Yellowbrick got you covered! It’s a a suite of visualization tools that extend the scikit-learn APIs.

First, we need to install yellowbrick package.

pip install yellowbrick

Next, we just need to import FeatureImportances module from yellowbrick and pass the trained decision tree model.

Voila!, We got the same result. yet it is easie to code and does not require a lot of processing.

We saw multiple techniques to visualize and to compute Feature Importance for the tree model. Both the techniques are not only visually appealing but they also help us to understand what is happening under the hood, this thus improves model explainability and helps communicating the model results to the business stakeholder.

Thanks for reading, Please check out my work on my GitHub profile and do give it if you find it useful!

--

--