Decision Trees and Splitting Functions (Gini, Information Gain and Variance Reduction)
Pre-requisites:
Please use the data set used in the blog post — Implementing Decision Trees in R — Classification Problem and run the R script. In the script, look for — text(output.tree,pretty=0) .
Add the following piece of code after the above code-
fancyRpartPlot(output.tree)
Install the following packages — RcolorBrewer, rattle and rpart.plot
This creates a more detailed Decision Tree as shown below. We will be using these to better understand the functioning of packages used for building Decision Trees.
Crux of a Decision Tree Algorithm:
· We begin with a root node.
· At each node, we ask a True/False question, which is generally based upon one of the features of the data.
· In response to the True/False question, we partition the data into two subsets.
· The subsets give rise to two child nodes.
Common Questions that pop in the mind -
· How does the algorithm decide on the first node (Root Node)?
· What questions are to be asked to create a split?