Decision Tree- Construction in ML

Apr 12 · 8 min read

In our daily life, we make enormous decisions from the start of our day to the end of our day. There are many times we use decision trees in our lives unknowingly. We have all about studied decision trees in our schools and colleges. How does it be everyone knows. A tree-like structure that starts with the root node and ends with leaf nodes. There are many other algorithms in ML based on the Decision trees.

Before getting into this if you want to know about the Data Science methodology of problem-solving. Visit Introduction to Data Science. Next, dive into the topic

What makes a decision tree such an important and useful algorithm? It makes it important because of the way it makes the decisions. Lets us study about it more. It is such an important model for making decisions so that our model makes the right prediction for the regression or classification data.

A decision tree is the most powerful and popular tool for classification and prediction. A Decision tree is a flowchart-like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (terminal node) holds a class label.

Decision tree and its terms

Decision trees are also used in every company before launching any product, maintaining the standards, and updating or closing it. It helps in saving time, income and making profits without getting into major losses like Break-even analysis, etc. If you want to know about how companies use it. Read this on the Harvard Business Review page. Click here to read.

How is it making accurate predictions? How is the split made and which one is on the top? What is the process happening internally? Okay, let's get into it to find how is it doing it all just like you do or just as a human does. Yes, people prioritize things so that their decision-making becomes clear. When you have two options like you have to attend the meeting or a function or stay at where you are, which one do you choose? Just the things which are important and right at the point based on the interests and importance of the situation. That’s it. The same thing is here then let's dig and find some terminology and have some understanding in terms of ML.

The first node(root node) splits the data into different decision points and if conditions satisfied it may be further split based on another feature or makes the required decision or interpretation. The same happens on different decision nodes. The end nodes can be called terminal nodes or leaf nodes of a parent node(decision node).

There are few important metrics by which splitting of a node happens. They are:

  1. Entropy: It is used to measure the impurity and randomness in the dataset. When a ball is picked from a bunch of the same color balls(let's say green) or one pearl picked from a basket of pearls. From these examples, it is the same one you pick even you to pick for n times. So the entropy is zero. When there are all different colors of balls then the probability of getting that ball decreases completely from 1 to 0.9. If there are 20% yellow,10%white,30% blue balls, and 40% of green. Then the probability of getting only green balls is 0.4. As the differences and choices increase the entropy gets higher means the probability of getting the particular ball gets down. Mathematically, Shannon’s entropy is calculated using the below formula:
Entropy-The impurity measure

After calculating entropy for a particular ball or on the particular thing which is to be known Information gain is calculated on it, to know how much information we can get from that particular feature so that the target variable is predicted correctly. The feature with the highest information gain becomes the root node through which split is made and are further divided so that prediction accuracy increases. It computes the difference between entropy before split and average entropy after split of the dataset based on given attribute value(probability of required feature). Information gain is calculated by subtracting the probability of a particular ball or feature from the total entropy of all the balls or features in the data.

Information gain with example

Information gain is biased for the attribute with many outcomes. It means it prefers the attribute with a large number of distinct values. For instance, consider an attribute with a unique identifier such as customer_ID that has zero info(D) because of pure partition. This maximizes the information gain and creates useless partitioning.

C4.5, an improvement of ID3, uses an extension to information gain known as the gain ratio. Gain ratio handles the issue of bias by normalizing the information gain using Split Info.

Gain ratio calculation

2. Gini: Gini impurity is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset. Gini index is also the type of criterion that helps us to calculate information gain. It measures the impurity of the node and is calculated for binary values only.

Example for Gini
Gini impurity

In the case of a discrete-valued attribute, the subset that gives the minimum Gini index for that chosen is selected as a splitting attribute. In the case of continuous-valued attributes, the strategy is to select each pair of adjacent values as a possible split-point and point with a smaller Gini index chosen as the splitting point.

The Gini index is the Gini coefficient expressed as a percentage and is equal to the Gini coefficient multiplied by 100. (The Gini coefficient is equal to half of the relative mean difference.) The Gini coefficient is often used to measure income inequality.

Range of gini and entropy

Note that we introduced a scaled version of the entropy (entropy/2) to emphasize that the Gini index is an intermediate measure between entropy and the classification error

  • The Gini criterion is much faster because it is less computationally expensive.

Let’s look at some of the decision trees in Python.

1. Iterative Dichotomiser 3 (ID3): This algorithm is used for selecting the splitting by calculating information gain. Information gain for each level of the tree is calculated recursively.

2. C4.5: This algorithm is the modification of the ID3 algorithm. It uses information gain or gain ratio for selecting the best attribute. It can handle both continuous and missing attribute values.

3. CART (Classification and Regression Tree): This algorithm can produce classification as well as regression tree. In the classification tree, the target variable is fixed. In a regression tree, the value of the target variable is to be predicted.

  • The decision tree implemented in Python is through sklearn library and it uses CART. In CART there are two methods through which we get the information for prediction. They are Gini and entropy. You have to go to other libraries to implement other decision trees. You can check chefboost etc. You can refer to this GitHub for more details on it for implementing

There are pros and cons for this algorithm too:


  1. It forces the algorithm to take into consideration all the possible outcomes of a decision and traces each path to a conclusion.


  1. Decision Trees are very sensitive to hyperparameter tuning. Unfortunately, the output of a decision tree can vary drastically if the hyperparameters are inaccurately tuned.

To learn more on the parameters, practical use and more things go to sklearn official site visit this site.

Yes, there is a lot in the decision tree still it is called the white-box model because of its simplicity as it can be easily calculated and can be understood, other complex models. Neural networks are called the Black-box model because of their complexity and high level of calculations which cannot be easily calculated and interpreted.

Success is not final, Failure is not fatal, It is the courage to continue that counts.

Make your life a master piece, imagine no limitations on what you can be, have or do

If you have learned something about the Decision tree and about decision-making do like it, show some support and share it with someone useful. If there are any questions or suggestions or any other things put on the comment box, I would do the best to answer the question. Stay safe and bring the light to the world.🤍

Geek Culture

Proud to geek out.

Sign up for Geek Culture Hits

By Geek Culture

Subscribe to receive top 10 most read stories of Geek Culture — delivered straight into your inbox, once a week. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Geek Culture

A new tech publication by Start it up (


Written by


Blogger, Achiever, Data science aspirant, Soulful person, Optimist

Geek Culture

A new tech publication by Start it up (

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store