A Simple Guide to Decision Tree

Jocelyn D'Souza
3 min readMar 8, 2018

--

Decision Tree is one of the most intuitive families of algorithms. It is extremely easy to understand.

Basic Decision Tree

What is a Decision Tree?

As the name describes, it is a tree which helps us in making decisions. Decision trees can handle non-linearity in data. Decision trees are capable of complex methods of separation through easy computational methods.

Is the above data linearly separable?

No. This is where Decision Trees are very useful.

Types of Decision Trees

There are two types of Decision Trees which are based on the type of target variables (output variables):

  • Categorical Variable Decision Tree: Decision Tree which has categorical target variable such as YES/NO, 0/1, True/False, etc. are called as Categorical Variable Decision Tree.(CLASSIFICATION)
  • Continuous Variable Decision Tree: Decision Tree which has continuous target variable (range of continuous numbers e.g. Temperature) are called as Continuous Variable Decision Tree.(REGRESSION)

Decision Tree Terminology

We need to understand few basic terminologies used with Decision trees:

  • Root Node: It represents entire population or sample and this further gets divided into two or more homogeneous sets.
  • Splitting: It is a process of dividing a node into two or more sub-nodes.
  • Parent: Node which is divided into sub-nodes is called parent node.
  • Child Node: sub-nodes created from the parent nodes are called the child node.
  • Decision Node: When a sub-node splits into further sub-nodes, then it is called decision node.
  • Leaf/ Terminal Node: Nodes do not split is called Leaf or Terminal node.
  • Branch/Sub-Tree: Sub-section of an entire tree.
Decision Tree Terminology

That’s great, as we have learned all the terminology let’s dive in a little bit deeper and understand how exactly decision tree works.

Splits in a Decision Tree

  • It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed.
  • The final result is a tree with decision nodes and leaf nodes.

Wait! How does a tree decide where to split?

Decision trees use multiple algorithms to decide to split a node into two or more sub-nodes.

Splits are done based on

  1. Gini index
  2. Entropy
  3. Chi-Squared

We will cover this in my future post here, I will go through the different splits in python which will give you a better understanding of how exactly the algorithm splits a node.

I hope you have got an intuition about how a Decision Tree Algorithm works.

Thanks for Reading! :) ❤

Reference

  1. Analytics Vidhya Blog

--

--