Decision Tree with Practical Implementation

Amir Ali
Amir Ali
Jul 13, 2018 · 16 min read
Image for post
Image for post

In this chapter, we will discuss the decision Tree Algorithm which is also Called ‘CART’ used for both classification and regression problems too and It’s a supervised machine learning algorithm.

This chapter spans 3 parts:

  1. What is Decision Tree?
  2. How Does Decision Tree work in Classification and Regression?
  3. Practical Implementation of Decision Tree in Scikit Learn.

1. What is a Decision Tree (CART)?

A decision tree is a largely used non-parametric effective machine learning modeling technique for regression and classification problems. To find solutions a decision tree makes a sequential, hierarchical decision about the outcomes variable based on the predictor data.

The decision tree builds regression or classification models in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a decision tree with nodes and leaf nodes.

The Understanding Level of Decision Tree algorithm is so easy as compared to the classification algorithm.

In the Decision tree algorithm, we solve our problem in a tree regression.

Each internal node of the tree corresponds to an attribute.

Each leaf node corresponds to a Class Label.

In the decision tree for predicting a class label for a record, we start from the root of the tree. We compare the value of the root attribute with the record’s attribute on the basis of comparison. We follow the branch corresponding to that value & jump to the next node. We continue comparing our record’s attribute value with other internal nodes of the tree until we reach a leaf node.

Image for post
Image for post

Here what are P and n? So to find p and n we check our class attribute or outcome which binary (0, 1). So for p, we take true value 1 (in case of binary) and for now, we take the false value 0 (binary value). We go deeper into the mathematical part just here introduction.

Entropy is basically used to create a tree. We find our entropy from attribute or class.

Image for post
Image for post
Image for post
Image for post

Gain is basically used to find one by one attribute of our training set.

2. How Does Decision Tree work in Classification and Regression?

This dataset has 10 instances and four numbers of attributes. Here first three attributes are predictor and the last attribute is the target attribute.

Image for post
Image for post
Table 4.1: Dataset

First of all, we will find information gain from the Target attribute which is Profit.

P: up = 5

P: down = 5

Information Gain

Image for post
Image for post

So we calculate the information gain of the overall dataset which is equal to 1.

After finding the information gain now we calculate the Entropy of each attribute.

Entropy (Age)

Image for post
Image for post

So in the age attribute, we have tree categorical data which is old, mid, new and find the information gain of each as shown above table.

So Entropy of Age

After finding the information of each (old, mid, new) put the value in below entropy equation to find the entropy of age.

Image for post
Image for post

So entropy of age is 0.4

After find the entropy we can easily calculate the gain.

Gain (Age):

Image for post
Image for post

So the gain of age is 0.6. Similarly, we can calculate the remaining attributes Gain.

Entropy (Competition)

Image for post
Image for post

So in competition attribute, we have two categorical data which is yes and no and find the information gain of each as shown above table.

So Entropy of competition:

After finding the information of each (yes, no) put the value in below entropy equation to find the entropy of competition.

Image for post
Image for post

So the entropy of the competition is 0.8754.

After finding the entropy we can easily calculate the gain.

Gain (competition):

Image for post
Image for post

So the gain from the competition is 0.1245.

Entropy (Type)

Image for post
Image for post

So in type attribute, we have two categorical data which are software and hardware and we find the information gain of each as shown above table.

So Entropy of type

After finding the information of each (hardware, software) put the value in below entropy equation to find the entropy of type.

Image for post
Image for post

Gain (type):

Image for post
Image for post

So Gain of type is 0.

Now We have calculated Gain of all attribute and we used this information to make a tree

Gain of age = 0.6

Gain of competition = 0.12

Gain of type = 0

So our step is that check the Maximum Gain of all the attributes and make a root node in our case age have a max gain which is equal to 0.6 as compared to other so age becomes a root node.

Image for post
Image for post

So age has three attributes which are Old, mid and new.

Image for post
Image for post

So for old age target attribute, all goes down

Image for post
Image for post
Image for post
Image for post

So make a leaf node for old which is down on the basis of our dataset information.

Image for post
Image for post

So for new age target attribute all go up

Image for post
Image for post
Image for post
Image for post

So make a leaf node for new age, which is upon the basis of our dataset information.

Image for post
Image for post

So we make a leaf node for old and new now what is for mid?

Because mid age has 2 cases:

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

So who becomes the child node of mid. To solve this problem we take again dataset for only mid calculate the gain for Competition and Type and make a child node of mid.

Image for post
Image for post

To calculate the Gain and Entropy similar above procedure.

P: yes = 2

P: no = 2

Information Gain

Image for post
Image for post

So I calculate the information gain of the above dataset which is equal to 1.

After finding the information gain calculate the Entropy of each attribute.

Entropy (Competition)

Image for post
Image for post

So in competition attribute, we have two categorical data which is yes and no and find the information gain of each as shown above the table.

So Entropy of competition

After finding the information about each (yes, no) puts the value in below entropy equation to find the entropy of competition.

Image for post
Image for post

So entropy of competition is 0

After finding the entropy we can easily calculate the gain.

Gain (competition):

Image for post
Image for post

So Gain from the competition is 1

Entropy (Type):

Image for post
Image for post

So in type attribute we have two categorical data which is software and hardware and we find the information gain of each as shown above the table.

So Entropy of type

After finding the information of each (hardware, software) put the value in below entropy equation to find the entropy of type.

Image for post
Image for post

Gain (type):

Image for post
Image for post

So Gain of type is 0.

Now who is a child node of mid check the Gain of type and competition:

Gain of competition = 1

Gain of type = 0

So Gain of Competition is Greater than type, so Competition becomes the child node of mid.

Image for post
Image for post

So

Image for post
Image for post
Image for post
Image for post

So make a leaf node for competition if the competition is then the profit will down and if competition no then profits up.

Image for post
Image for post

So our tree has been completed we use a Gain from all the attributes and make it. According to our tree, we can take only that decision which company gain maximum profit

1st: So Profit if age is new.

2nd: profit if age is mid and no competition.

So we use this imported information to give the company a profit.

Note: If you want this article check out my academia.edu profile.

Dataset:

This dataset has 14 instances and five numbers of attributes. Here first four attributes are predictor and the last attribute is the target attribute.

Image for post
Image for post
Table 2: Dataset

Standard Deviation

A decision tree is built top-down from a root node and involves partitioning the data into subsets that Contain instances with similar values (homogenous). We use standard deviation to calculate the Homogeneity of a numerical sample. If the numerical sample is completely homogeneous it's standard deviation is zero.

a) Standard deviation for one attribute:

Image for post
Image for post

· Standard Deviation (S) is for tree building (branching).

· Coefficient of Deviation (CV) is used to decide when to stop branching. We can use Count (n) as well.

· Average (Avg) is the value in the leaf nodes.

b) Standard deviation for two attributes (target and predictor):

Image for post
Image for post

Standard Deviation Reduction

The standard deviation reduction is based on the decrease in standard deviation after a dataset is

split on an attribute. Constructing a decision tree is all about finding attribute that returns the

highest standard deviation reduction (i.e., the most homogeneous branches).

Step 1: The standard deviation of the target is calculated.

Standard deviation (Hours Played) = 9.32

Step 2: The dataset is then split into different attributes. The standard deviation for

each branch is calculated. The resulting standard deviation is subtracted from the

standard deviation before the split. The result is the standard deviation reduction.

Image for post
Image for post
Image for post
Image for post

Similarly find the standard deviation for temp , Humidity, and wind as shown below table:

Image for post
Image for post

Step 3: The attribute with the largest standard deviation reduction is chosen for the decision node.

Image for post
Image for post

Step 4a: The dataset is divided based on the values of the selected attribute. This process is run recursively on the non-leaf branches until all data is processed.

Image for post
Image for post

In practice, we need some termination criteria. For example, when the coefficient of deviation (CV) for a branch becomes smaller than a certain threshold (e.g., 10%) and/or when too few instances (n) remain in the branch (e.g., 3).

Step 4b: “Overcast” subset does not need any further splitting because its CV (8%) is less than the threshold (10%). The related leaf node gets the average of the “Overcast” subset.

Image for post
Image for post

Step 4c: However, the “Sunny” branch has a CV (28%) more than the threshold (10%) which needs further splitting. We select “Windy” as the best node after “Outlook” because it has the largest SDR.

Image for post
Image for post

Because the number of data points for both branches (FALSE and TRUE) is equal or less than 3 we stop further branching and assign the average of each branch to the related leaf node.

Image for post
Image for post

Step 4d: Moreover, the “rainy” branch has an CV (22%) which is more than the threshold (10%). This branch needs further splitting. We select “Windy” as the best node because it has the largest SDR.

Image for post
Image for post

Because the number of data points for all three branches (Cool, Hot and Mild) is equal or less than 3 we stop further branching and assign the average of each branch to the related leaf node.

Image for post
Image for post

When the number of instances is more than one at a leaf node we calculate the average as the final value for the target.

Note: If you want this article check out my academia.edu profile.

3. Practical Implementation of Decision Tree in Scikit Learn.

Dataset Description:

This Dataset has 400 instances and 5 attributes which is a User ID, Gender, Age, Estimate Salary and last is Purchased which Target attributes. First Four Attribute which is ID, Gender, Age, Estimate Salary is predictor and the last attribute is the target attribute.

Image for post
Image for post

Part 1: Data Preprocessing:

1.1 import the Libraries

In this step, we import three Libraries in Data Preprocessing part. A library is a tool that you can use to make a specific job. First of all, we import the numpy library used for multidimensional array then import the pandas library used to import the dataset and in last we import matplotlib library used for plotting the graph.

Image for post
Image for post

1.2 Import the dataset

In this step, we import the dataset to do that we use the pandas library. After import our dataset we define our Predictor and target attribute. Our Predictor attributes are a User ID, Gender, Age and Estimated Salary as you can see in the sample dataset which we call ‘X’ here and Purchased is a target attribute which we call ‘y’ here.

Image for post
Image for post

1.3 Split the dataset for test and train

In this step, we split our dataset into a test set and train set and a 75% dataset split for training and the remaining 25% for tests.

Image for post
Image for post

1.4 Feature Scaling

Feature Scaling is the most important part of data preprocessing. If we see our dataset then some attribute contains information in Numeric value some value very high and some are very low if we see the age and estimated salary. This will cause some issues in our machinery model to solve that problem we set all values on the same scale there are two methods to solve that problem first one is Normalize and Second is Standard Scaler.

Image for post
Image for post

Here we use standard Scaler import from Sklearn Library.

Image for post
Image for post

Part 2: Building the Decision tree classifier model:

In this part, we model our Decision Tree Classifier model using Scikit Learn Library.

2.1 Import the Libraries

In this step, we are building our Decision Tree model to do this first we import a Decision Tree model from Scikit Learn Library.

Image for post
Image for post

2.2 Initialize our Decision Tree model

In this step, we initialize our model. Note down here I passed the parameter criterion which is equal to entropy basically it’ purpose (the function to measure the quality of a split. Supported criteria are “gini” for the Gini index impurity and entropy for the information gain)

Image for post
Image for post

2.3 Fitting the Decision Tree Model

In this step, we fit the training data into our model X_train, y_train is our training data.

Image for post
Image for post

Part 3: Making the Prediction and Visualizing the result:

In this Part, we make a prediction of our test set dataset and visualizing the result using the matplotlib library.

3.1 Predict the test set Result

In this step, we predict our test set result.

Image for post
Image for post

3.2 Confusion Metrics

In this step we make a confusion metric of our test set result to do that we import confusion matrix from sklearn.metrics then in confusion matrix, we pass two parameters first is y_test which is the actual test set result and second is y_pred which predicted result.

Image for post
Image for post

3.3 Accuracy Score

In this step, we calculate the accuracy score on the basis of the actual test result and predict test results.

Image for post
Image for post

3.4 Visualize our Test Set Result

In the step, we visualize our test set result to do this we use a matplotlib library and we can see only 9 points are the incorrect map and the remaining 91 are the correct map in the graph according to the model test set result.

Image for post
Image for post

If you want dataset and code you also check my Github Profile.

Dataset Description:

This dataset contains information about position salaries which have three attributes that have the position, level, and salary. This dataset has 10 instances. As we see our target attribute has regression value so this is a regression problem and we use the decision tree regression model to predict the salaries on the basis of predictor attribute which is Level and Position.

Image for post
Image for post

Part 1: Data Preprocessing:

1.1 Import the Libraries

In this step, we import three Libraries in Data Preprocessing part. A library is a tool that you can use to make a specific job. First of all, we import the numpy library used for multidimensional array then import the pandas library used to import the dataset and in last we import matplotlib library used for plotting the graph

Image for post
Image for post

1.2 Import the dataset

In this step, we import the dataset to do that we use the pandas library. After import our dataset we define our Predictor and target attribute. Our Predictor attributes are a Position and Level as you can see in the sample dataset which we call ‘X’ here and Salary is a target attribute which we call ‘y’ here.

Image for post
Image for post

Note: This dataset is too small only 10 instances so we not split the data into train and test set

Part 2: Building the Decision tree Regression model:

In this part, we model our Decision Tree Regression model using Scikit Learn Library.

2.1 Import the Libraries

In this step, we are building our Decision Tree model to do this first we import a Decision Tree model from Scikit Learn Library.

Image for post
Image for post

2.2 Initialize our Decision Tree model

In this step, we initialize our Regressor model.

Image for post
Image for post

2.3 Fitting the Decision Tree Regressor Model

In this step, we fit the data into our model.

Image for post
Image for post

Part 3: Making the Prediction and Visualizing the result:

In this Part, we make a prediction and visualizing the result using the matplotlib library.

3.1 Predict the Result

Our dataset is in real value so for predict method, we enter the value which 6.5( level of position) and our model gave the prediction 15000.

Image for post
Image for post
Image for post
Image for post

3.2 Visualize Result

In the Visualizing step, we make a graph between position level and salary which is our predicted output. So on the basis of level and salary set the point(red color) and then predict the salary which blue line.

Image for post
Image for post

If you want dataset and code you also check my Github Profile.

End Notes:

If you liked this article, be sure to click ❤ below to recommend it and if you have any questions, leave a comment and I will do my best to answer.

For being more aware of the world of machine learning, follow me. It’s the best way to find out when I write more articles like this.

You can also follow me on Github for code & dataset follow on Aacademia.edu for this article, Twitter and Email me directly or find me on LinkedIn. I’d love to hear from you.

That’s all folks, Have a nice day :)

Wavy AI Research Foundation

This publication is completely about Machine Learning and…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store