Decision Tree with Practical Implementation

Published in

The Art of Data Scicne

16 min readJul 13, 2018

In this chapter, we will discuss the decision Tree Algorithm which is also Called ‘CART’ used for both classification and regression problems too and It’s a supervised machine learning algorithm.

This chapter spans 3 parts:

What is Decision Tree?
How Does Decision Tree work in Classification and Regression?
Practical Implementation of Decision Tree in Scikit Learn.

1. What is a Decision Tree (CART)?

A decision tree is a largely used non-parametric effective machine learning modeling technique for regression and classification problems. To find solutions a decision tree makes a sequential, hierarchical decision about the outcomes variable based on the predictor data.

The decision tree builds regression or classification models in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a decision tree with nodes and leaf nodes.

The Understanding Level of Decision Tree algorithm is so easy as compared to the classification algorithm.

In the Decision tree algorithm, we solve our problem in a tree regression.

Each internal node of the tree corresponds to an attribute.

Each leaf node corresponds to a Class Label.

In the decision tree for predicting a class label for a record, we start from the root of the tree. We compare the value of the root attribute with the record’s attribute on the basis of comparison. We follow the branch corresponding to that value & jump to the next node. We continue comparing our record’s attribute value with other internal nodes of the tree until we reach a leaf node.

1.1 Information Gain:

Here what are P and n? So to find p and n we check our class attribute or outcome which binary (0, 1). So for p, we take true value 1 (in case of binary) and for now, we take the false value 0 (binary value). We go deeper into the mathematical part just here introduction.

1.2 Entropy:

Entropy is basically used to create a tree. We find our entropy from attribute or class.

1.3 Gain:

Gain is basically used to find one by one attribute of our training set.

2. How Does Decision Tree work in Classification and Regression?

2.1 Classification:

This dataset has 10 instances and four numbers of attributes. Here first three attributes are predictor and the last attribute is the target attribute.

First of all, we will find information gain from the Target attribute which is Profit.

P: up = 5

P: down = 5

Information Gain

So we calculate the information gain of the overall dataset which is equal to 1.

After finding the information gain now we calculate the Entropy of each attribute.

Entropy (Age)

So in the age attribute, we have tree categorical data which is old, mid, new and find the information gain of each as shown above table.

So Entropy of Age

After finding the information of each (old, mid, new) put the value in below entropy equation to find the entropy of age.

So entropy of age is 0.4

After find the entropy we can easily calculate the gain.

Gain (Age):

So the gain of age is 0.6. Similarly, we can calculate the remaining attributes Gain.

Entropy (Competition)

So in competition attribute, we have two categorical data which is yes and no and find the information gain of each as shown above table.

So Entropy of competition:

After finding the information of each (yes, no) put the value in below entropy equation to find the entropy of competition.

So the entropy of the competition is 0.8754.

After finding the entropy we can easily calculate the gain.

Gain (competition):

So the gain from the competition is 0.1245.

Entropy (Type)

So in type attribute, we have two categorical data which are software and hardware and we find the information gain of each as shown above table.

So Entropy of type

After finding the information of each (hardware, software) put the value in below entropy equation to find the entropy of type.

Gain (type):

So Gain of type is 0.

Now We have calculated Gain of all attribute and we used this information to make a tree

Gain of (age, competition, type)

Gain of age = 0.6

Gain of competition = 0.12

Gain of type = 0

So our step is that check the Maximum Gain of all the attributes and make a root node in our case age have a max gain which is equal to 0.6 as compared to other so age becomes a root node.

So age has three attributes which are Old, mid and new.

So for old age target attribute, all goes down

So make a leaf node for old which is down on the basis of our dataset information.

So for new age target attribute all go up

So make a leaf node for new age, which is upon the basis of our dataset information.

So we make a leaf node for old and new now what is for mid?

Because mid age has 2 cases:

So who becomes the child node of mid. To solve this problem we take again dataset for only mid calculate the gain for Competition and Type and make a child node of mid.

To calculate the Gain and Entropy similar above procedure.

P: yes = 2

P: no = 2

Information Gain

So I calculate the information gain of the above dataset which is equal to 1.

After finding the information gain calculate the Entropy of each attribute.

Entropy (Competition)

So in competition attribute, we have two categorical data which is yes and no and find the information gain of each as shown above the table.

So Entropy of competition

After finding the information about each (yes, no) puts the value in below entropy equation to find the entropy of competition.

So entropy of competition is 0

After finding the entropy we can easily calculate the gain.

Gain (competition):

So Gain from the competition is 1

Entropy (Type):

So in type attribute we have two categorical data which is software and hardware and we find the information gain of each as shown above the table.

So Entropy of type

After finding the information of each (hardware, software) put the value in below entropy equation to find the entropy of type.

Gain (type):

So Gain of type is 0.

Now who is a child node of mid check the Gain of type and competition:

Gain of competition = 1

Gain of type = 0

So Gain of Competition is Greater than type, so Competition becomes the child node of mid.

So make a leaf node for competition if the competition is then the profit will down and if competition no then profits up.

So our tree has been completed we use a Gain from all the attributes and make it. According to our tree, we can take only that decision which company gain maximum profit

1st: So Profit if age is new.

2nd: profit if age is mid and no competition.

So we use this imported information to give the company a profit.

Note: If you want this article check out my academia.edu profile.

2.2 Regression:

Dataset:

This dataset has 14 instances and five numbers of attributes. Here first four attributes are predictor and the last attribute is the target attribute.

Standard Deviation

A decision tree is built top-down from a root node and involves partitioning the data into subsets that Contain instances with similar values (homogenous). We use standard deviation to calculate the Homogeneity of a numerical sample. If the numerical sample is completely homogeneous it's standard deviation is zero.

a) Standard deviation for one attribute:

· Standard Deviation (S) is for tree building (branching).

· Coefficient of Deviation (CV) is used to decide when to stop branching. We can use Count (n) as well.

· Average (Avg) is the value in the leaf nodes.

b) Standard deviation for two attributes (target and predictor):

Standard Deviation Reduction

The standard deviation reduction is based on the decrease in standard deviation after a dataset is

split on an attribute. Constructing a decision tree is all about finding attribute that returns the

highest standard deviation reduction (i.e., the most homogeneous branches).

Step 1: The standard deviation of the target is calculated.

Standard deviation (Hours Played) = 9.32

Step 2: The dataset is then split into different attributes. The standard deviation for

each branch is calculated. The resulting standard deviation is subtracted from the

standard deviation before the split. The result is the standard deviation reduction.

Similarly find the standard deviation for temp , Humidity, and wind as shown below table:

Step 3: The attribute with the largest standard deviation reduction is chosen for the decision node.

Step 4a: The dataset is divided based on the values of the selected attribute. This process is run recursively on the non-leaf branches until all data is processed.

In practice, we need some termination criteria. For example, when the coefficient of deviation (CV) for a branch becomes smaller than a certain threshold (e.g., 10%) and/or when too few instances (n) remain in the branch (e.g., 3).

Step 4b: “Overcast” subset does not need any further splitting because its CV (8%) is less than the threshold (10%). The related leaf node gets the average of the “Overcast” subset.

Step 4c: However, the “Sunny” branch has a CV (28%) more than the threshold (10%) which needs further splitting. We select “Windy” as the best node after “Outlook” because it has the largest SDR.

Because the number of data points for both branches (FALSE and TRUE) is equal or less than 3 we stop further branching and assign the average of each branch to the related leaf node.

Step 4d: Moreover, the “rainy” branch has an CV (22%) which is more than the threshold (10%). This branch needs further splitting. We select “Windy” as the best node because it has the largest SDR.

Because the number of data points for all three branches (Cool, Hot and Mild) is equal or less than 3 we stop further branching and assign the average of each branch to the related leaf node.

When the number of instances is more than one at a leaf node we calculate the average as the final value for the target.

Note: If you want this article check out my academia.edu profile.

3. Practical Implementation of Decision Tree in Scikit Learn.

3.1 Classification approach:

Dataset Description:

This Dataset has 400 instances and 5 attributes which is a User ID, Gender, Age, Estimate Salary and last is Purchased which Target attributes. First Four Attribute which is ID, Gender, Age, Estimate Salary is predictor and the last attribute is the target attribute.

Part 1: Data Preprocessing:

1.1 import the Libraries

In this step, we import three Libraries in Data Preprocessing part. A library is a tool that you can use to make a specific job. First of all, we import the numpy library used for multidimensional array then import the pandas library used to import the dataset and in last we import matplotlib library used for plotting the graph.

1.2 Import the dataset

In this step, we import the dataset to do that we use the pandas library. After import our dataset we define our Predictor and target attribute. Our Predictor attributes are a User ID, Gender, Age and Estimated Salary as you can see in the sample dataset which we call ‘X’ here and Purchased is a target attribute which we call ‘y’ here.

1.3 Split the dataset for test and train

In this step, we split our dataset into a test set and train set and a 75% dataset split for training and the remaining 25% for tests.

1.4 Feature Scaling

Feature Scaling is the most important part of data preprocessing. If we see our dataset then some attribute contains information in Numeric value some value very high and some are very low if we see the age and estimated salary. This will cause some issues in our machinery model to solve that problem we set all values on the same scale there are two methods to solve that problem first one is Normalize and Second is Standard Scaler.

Here we use standard Scaler import from Sklearn Library.

Part 2: Building the Decision tree classifier model:

In this part, we model our Decision Tree Classifier model using Scikit Learn Library.

2.1 Import the Libraries

In this step, we are building our Decision Tree model to do this first we import a Decision Tree model from Scikit Learn Library.

2.2 Initialize our Decision Tree model

In this step, we initialize our model. Note down here I passed the parameter criterion which is equal to entropy basically it’ purpose (the function to measure the quality of a split. Supported criteria are “gini” for the Gini index impurity and entropy for the information gain)

2.3 Fitting the Decision Tree Model

In this step, we fit the training data into our model X_train, y_train is our training data.

Part 3: Making the Prediction and Visualizing the result:

In this Part, we make a prediction of our test set dataset and visualizing the result using the matplotlib library.

3.1 Predict the test set Result

In this step, we predict our test set result.

3.2 Confusion Metrics

In this step we make a confusion metric of our test set result to do that we import confusion matrix from sklearn.metrics then in confusion matrix, we pass two parameters first is y_test which is the actual test set result and second is y_pred which predicted result.

3.3 Accuracy Score

In this step, we calculate the accuracy score on the basis of the actual test result and predict test results.

3.4 Visualize our Test Set Result

In the step, we visualize our test set result to do this we use a matplotlib library and we can see only 9 points are the incorrect map and the remaining 91 are the correct map in the graph according to the model test set result.

If you want dataset and code you also check my Github Profile.

3.2 Regression approach

Dataset Description:

This dataset contains information about position salaries which have three attributes that have the position, level, and salary. This dataset has 10 instances. As we see our target attribute has regression value so this is a regression problem and we use the decision tree regression model to predict the salaries on the basis of predictor attribute which is Level and Position.

Part 1: Data Preprocessing:

1.1 Import the Libraries

1.2 Import the dataset

In this step, we import the dataset to do that we use the pandas library. After import our dataset we define our Predictor and target attribute. Our Predictor attributes are a Position and Level as you can see in the sample dataset which we call ‘X’ here and Salary is a target attribute which we call ‘y’ here.

Note: This dataset is too small only 10 instances so we not split the data into train and test set

Part 2: Building the Decision tree Regression model:

In this part, we model our Decision Tree Regression model using Scikit Learn Library.

2.1 Import the Libraries

In this step, we are building our Decision Tree model to do this first we import a Decision Tree model from Scikit Learn Library.

2.2 Initialize our Decision Tree model

In this step, we initialize our Regressor model.

2.3 Fitting the Decision Tree Regressor Model

In this step, we fit the data into our model.

Part 3: Making the Prediction and Visualizing the result:

In this Part, we make a prediction and visualizing the result using the matplotlib library.

3.1 Predict the Result

Our dataset is in real value so for predict method, we enter the value which 6.5( level of position) and our model gave the prediction 15000.

3.2 Visualize Result

In the Visualizing step, we make a graph between position level and salary which is our predicted output. So on the basis of level and salary set the point(red color) and then predict the salary which blue line.

If you want dataset and code you also check my Github Profile.

End Notes:

If you liked this article, be sure to click ❤ below to recommend it and if you have any questions, leave a comment and I will do my best to answer.

For being more aware of the world of machine learning, follow me. It’s the best way to find out when I write more articles like this.

You can also follow me on Github for code & dataset follow on Aacademia.edu for this article, Twitter and Email me directly or find me on LinkedIn. I’d love to hear from you.

That’s all folks, Have a nice day :)

Decision Tree with Practical Implementation

1. What is a Decision Tree (CART)?

1.1 Information Gain:

1.2 Entropy:

1.3 Gain:

2. How Does Decision Tree work in Classification and Regression?

2.1 Classification:

Gain of (age, competition, type)

2.2 Regression:

3. Practical Implementation of Decision Tree in Scikit Learn.

3.1 Classification approach:

3.2 Regression approach

End Notes:

Written by Amir Ali