Decision Tree — What, Why, Where?
During the learning process of Machine Learning concepts and algorithms, I found it really difficult to get all the information in one place. Decision tree is one of the most important and simple concepts in Machine Learning, yet sometimes difficult to understand. But once you get expert into it, it will bring you wonders. So, I collected about what, why, and where to use the algorithm in my notes and thought to share with you all.
Disclaimer: This will clear your theoretical concepts about decision trees. To get a strong foundation in Decision trees, at the end of this article, I will be mentioning a few of the worthy projects in DT, so keep going!
Take a pen and a notepad with you and go slow!! Happy learning!!
What is Decision Tree?
It is a supervised learning method that classifies or labels objects- by asking a series of questions. Some of the common terms used in this algorithm are-
- Root node : It is the beginning node of the tree which represents the entire population or sample to be analyzed.
- Decision node : This consists of internal sub-nodes which has exactly one incoming edge and multiple outgoing edges. It is called so because it shows a decision to be made- outcome of which branches off to additional nodes.
- Terminal/Leaf-node : This node represents a class label that holds the decision taken after computing all attributes.
- Parent & child node : These are the relative and most common terms in every tree method. Successor of a node is called child node whereas the predecessor of child node is called parent node.
- Branch/ Sub-tree : This holds the outcome of the decision taken by internal sub-nodes or decision nodes.
- Pruning : When we remove sub-nodes or some sections of the decision tree which are non-critical in order to reduce the tree size, is called pruning.
- Splitting : It is again a common term in the concept of trees, also known as node splitting, in which a node is split into multiple sub-nodes until the leaf node.
In this algorithm, we perform recursive binary splitting that makes this an extremely efficient algorithm. In simple words, at each question, the selection of answer, that is a particular branch leads to cut the number of options by approximately half; leading to very quickly narrowing down the options even among a very large number of classes. This feature of such tree-based methods empowers predictive models with high accuracy, stability, and ease of interpretation.
Why do we need Decision Tree?
There are two main reasons why we need a decision tree in machine learning —
- to predict by calculating the probability that a given item belongs to which category (for example, identifying the species for a given bird/animal.).
- to classify the item by assigning it to the most likely class.
The above reasons divide the Decision tree into 2 types —
- DecisionTreeClassifier() — based on 2nd reason- to classify; it is used when we have 2 or more categorical values as response variable.
- DecisionTreeRegressor() — based on 1st reason- to predict; it is used when we have continuous values as response variable.
(We will be learning in-depth about Decision Tree types in Part2 of Decision Trees.)
Where to apply Decision Tree?
There are numerous scenarios where you can apply decision tree algorithm. A few of them are listed below-
- Categorization problems where attributes or features are systematically checked to determine a final category.
- When the user has an objective he/she is trying to achieve (e.g., maximize profit, optimize cost, etc.).
- A situation where there is an uncertainty concerning which outcome will actually happen?
- Evaluation of brand expansion opportunities for a business using historical sales data.
- In operation research, specifically in decision analysis to help determine a more promising and feasible strategy.
(e.g., Akinator game on Android — it thinks about a real/fictional character through a number of questionnaires; your answer to each of the questions makes it choose the decision path leading to an appropriate class label.) - Other application domains constitute — Manufacturing, production, Biotech, Astronomy, Pharmacology, and many more.
Bonus: As mentioned at the beginning of the article, I am listing below few worthy hands-on projects that one must try in order to gain expertise in Decision tree implementation.
- A simple yet interesting Guessing game like Akinator. Read more about it here.
- Clinical Decision Analysis ( It allows decision-makers to apply evidence-based medicine to make objective clinical decisions when faced with complex situations). Read more about it here.
Of course, this is not all about Decision trees. This article gives you only a strong basic knowledge about DTs. If you loved this blog, do give a CLAP, and tell me in comments if you want Part2 of Decision trees where I will be including all the most common questions asked in the data science community about DTs and some other concepts like Ensemble methods in DTs, multiple splitting algorithms used in DTs and tree pruning, etc.
Resources: Google, Wikipedia, ScienceDirect
Thank you for reading out until the end!