Decision Trees and Splitting Functions (Gini, Information Gain and Variance Reduction)

Subha Ganapathi
Nerd For Tech
Published in
4 min readMar 17, 2021

--

Pre-requisites:

Please use the data set used in the blog post — Implementing Decision Trees in R — Classification Problem and run the R script. In the script, look for — text(output.tree,pretty=0) .

Add the following piece of code after the above code-

fancyRpartPlot(output.tree)

Install the following packages — RcolorBrewer, rattle and rpart.plot

This creates a more detailed Decision Tree as shown below. We will be using these to better understand the functioning of packages used for building Decision Trees.

Fig. A Decision Tree plot

Crux of a Decision Tree Algorithm:

· We begin with a root node.

· At each node, we ask a True/False question, which is generally based upon one of the features of the data.

· In response to the True/False question, we partition the data into two subsets.

· The subsets give rise to two child nodes.

Common Questions that pop in the mind -

· How does the algorithm decide on the first node (Root Node)?

· What questions are to be asked to create a split?

--

--

Subha Ganapathi
Nerd For Tech

Data Engineer, Visualization & Analytics consultant.