Decision Tree — Implementation From Scratch in Python.

To make a decision … ask a tree.

ranga_vamsi
The Startup
9 min readJul 14, 2020

--

Photo by Oliver Roos on Unsplash

Introduction

Decision Tree Algorithm belongs to a class of non-parametric supervised Machine Learning algorithms used for solving both regression and classification tasks. In general, we address it as Classification and Regression Tree(CART).

As per Wikipedia, A decision tree is a flowchart-like structure in which each internal node represents a “test” on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes). The paths from the root to the leaf represent classification rules.

Basic Flow Design of Decision Tree

The algorithm for building the decision tree breaks down data into homogenous partitions using binary recursive partitions. The most discriminative feature is first selected as the root node to partition the data into branch nodes by learning decision rules. The final build is a tree with internal or decision nodes and leaf nodes. Decision nodes have two or more branches representing attributes(or features), the branch represents a decision rule, and each leaf nodes correspond to a class label(Classification) or decision(Regression).

How does the tree make decisions?

If the dataset consists of N features, how does the tree decide which attribute to place at the root node, or the internal nodes and arrive at the leaf node?

Attribute Selection Measures

An Attribute selection measure is a heuristic for selecting the splitting criterion that partition the data into the best possible way. It is also known as decision rules or splitting rules. There are many selection measures namely, Information gain, Gain Ratio, Gini Index, Reduction in Variance, Chi-Square.

1. Information Gain(IG)

Entropy measures impurity or disorder or uncertainty in the data. It controls the decision tree in making split decisions.

Mathematically,

Equation of entropy.

Information Gain is a statistical property that measures how much information a feature gives about the class. It gives a decrease in entropy. It computes the difference between entropy before split and average entropy after split of the dataset based on given attribute value.

Mathematically,

entropy(parent)
Weights Average * entropy(Children)
Information gain

Where,

  • pi is the probability that an arbitrary tuple in D belongs to class Ci.
  • Info(D) is the mean amount of information required to identify the class of a tuple in D.
  • |Dj|/|D| is the weight of the jth split.
  • v is the number of discrete values in attribute A.
  • InfoA(D) is the expected information required to classify a tuple from D based on the partitioning by attribute A.

The attribute(A) with the highest information gain(Gain(A)) is chosen as the splitting attribute.

ID3 (Iterative Dichotomiser) decision tree algorithm uses information gain.

2. Gain Ratio

The information gain is biased towards choosing the attributes with many distinct values. Gain ratio overcomes the problem with information gain by normalizing the information gain with Split Info.

Mathematically,

Split Information for attribute A
Gain Ratio for attribute A

The attribute(A) with the highest Gain Ratio(GainRatio(A)) is chosen as the splitting attribute.

C4.5, an improvement of ID3, uses the Gain ratio.

3. Gini Index

Gini Index is calculated by subtracting the sum of the squared probabilities of each class from one. Mathematically,

pi is the probability that a tuple in the dataset belongs to class Ci.

The Gini Index performs a binary split for each attribute. It favors larger partitions whereas information gain favors smaller partitions with distinct values.

Mathematically,

Gini index of D, if binary split on Attribute A partitions data D into D1 and D2
Gini index of the attribute A

The attribute(A) with the minimum Gini Index(Gini(A)) is chosen as the splitting attribute.

CART (Classification and Regression Tree) uses the Gini method.

Decision Tree Classification algorithm

I would like to walk you through a simple example along with the python code.

Step 1. We start by importing dataset and necessary dependencies

We will be using the weather dataset for the training phase. This dataset includes features [Outlook, Temp, Humidity, Windy], and the corresponding target variable ‘Play’. Now, we need to predict whether players will play or not based on given weather conditions.

Loading Dataset

Step 2: Build the Decision Tree

We will be using Information Gain as an attribute selection measure for partitioning the dataset. We need to go through each feature/column and check which feature is a good split.

2.1 Find the attribute with the highest information gain.

Now, we can see that the attribute Outlook has the highest information gain.

2.2 Make the attribute with the highest information gain as a decision node and split the dataset accordingly.

Now, we make the attribute ‘Outlook’ as a decision node and partition the dataset into smaller subsets.

Since Node ‘Child-1’ has all rows with the same target values, we can consider that Child-1 has Pure node or Leaf Node.
Nodes Child-2 and Child-3 are impure since they consist of rows with mixed target values. So, we can either proceed to partition these nodes further till we get pure nodes or we can specify several parameters, such as max_depth, which is the maximum of depth you want the tree to build, min_sample_leaf, which is the minimum sample that each node should have, and so on

2.3 Repeat the Steps 2.1 and 2.2 recursively until we get leaf nodes.

We can consider a node as a leaf node, when

  • we have all tuples in the node with the same target value, or
  • there are no remaining attribute, or
  • the tree we are building reaches its maximum depth, or
  • the count of the samples is less than the minimum sample count, or

we can also define many other termination criteria for leaf nodes.

After step 2, Our Decision Tree looks like:

Step 3. Start Making Predictions

Predict in the Decision Tree is simply to follow the path in the constructed tree from the root node to the leaf node by obeying decision rules at internal nodes. Once you arrive at the leaf node, return the value of that node.

Suppose, if we want to predict whether the players play or not, when the weather conditions are [Outlook=Rainy, Temp=35.5, Humidity=Normal, Windy=t]?

Our decision tree predicted that the players will not play in the given weather conditions.

Now, we have everything we need for this beautiful algorithm. Let’s put everything together!

Assemble Code Branches

Decision Tree Classification

Here’s the output:

Decision Tree Regression Algorithm

We can start by initiating a class and importing the dataset and necessary dependencies.

Then we can start defining the _find_best_splitmethod to show how the split is done:

In the above method, we try to find the best feature to split on and let the best split wins.

we use the method _find_feature_split to get the split score and cutoff value(for numerical columns) for each feature.

We will split on the feature column such that it has a low standard deviation. The method _find_score returns the sum of the weighted sum of standard deviations of the left split and the right split and we pick the cutoff value that gives us the minimum score.

Let’s look at the actual logic for building a Decision Tree!

After finding the best feature that split the dataset into subsets, we then find the best feature split recursively on each subset until we are left out with nodes that cannot be processed further.

Here, we are using max_depth_tree(maximum depth of the tree), min_sample_leaf(minimum number of samples in each node), minimum coefficient of variation as the termination conditions for a node to be a leaf node.

Finally, let's look at the predictions method:

Let’s put everything together

Assemble Code Branches

Decision Tree Regressor Algorithm

you can also get the complete code for both classification and regression fr om my Github repo.

Leaf Segment

Had fun in implementing Decision Tree from scratch? I hope this blog helps you understand how to build the decision tree classifier and regressor from scratch in python. If there is anything that is unclear, or I missed anything, or something was inaccurate, or if you any other feedback, please let me know. And, 💛 this if this was a good read.

Thanks for your time :)

Cheers !!

Happy Learning 😃

--

--

ranga_vamsi
The Startup

CS Grad | Data Science & Web Development Enthusiast