An essential guide to classification and regression trees in R Language

The classification trees and regression trees find their roots from CHAID, which is Chi-Square Automatic Interaction Detector. Kass proposed this in 1980. To gain deep insights into classification and regression trees, one might need to explore a little further into CHAID. However, according to Riley, CHAID was derived from THAID, also known as Theta AID proposed by Morgan and Messenger in 1973. Classification and regression trees are also called recursive partitioning trees have been extensively used in predictive analytics. Though both classification and regression trees can be leveraged for various purposes, the inventors of both THAID and CHAID specifically wanted to trace the non-linear effects of the response variable and the interactions between the predictor variables. CHAID is a non-binary tree, meaning, it can have more than two branches to connect to a node or a single root. CHAID algorithms have been heavily leveraged in SPSS as well as early as in 2001. Belson in 1959 proposed paper to be able to predict the outcome of the second group based on the observations from the first group. Morgan in 1963 proposed AID (Automatic Interaction Detector) for a binary regression tree for generating quantitative outcome from the variables. Therefore, the roots of the decision trees point to statistics. It was not popularized much into Computer Science, until 1971, when AID program was developed by Morgan in 1971.

Primarily there are two fundamental differences between the classification and regression trees. The classification tree splits the response variable into mainly two classes Yes or No, also can be numerically categorized as 1 or 0. To apply recursive partitioning on the target category that can contain multiple variables, C4.5 algorithm is leveraged. In case of simple binary splits, CART algorithm is used. This is the reason why classification tree is applied when there is a need for categorical variable for categorical outcome. The regression trees are leveraged in case where the response variable is either continuous or numeric, but not categorical. Regression trees can be applied in case of prices, quantities, or data involving quantities etc.

The regression and classification trees are machine-learning methods to building the prediction models from specific datasets. The data is split into multiple blocks recursively and the prediction model is fit on each of such partition of the prediction model. Now, each partition represents the data as a graphical decision tree. The primary difference between classification and regression decision trees is that, the classification decision trees are built with unordered values with dependent variables. The regression decision trees take ordered values with continuous values. In case of classification decision tree, for the trained dataset td for m number of observations, for a class variable Cl for p and l predictor variables from Z1 …. Zn. The objective is to build a predictive model for the values of Cl from new Z values. The Z has to be partitioned into multiple blocks. The initial algorithm that was built in the early stages of classification decision trees is THAID. Classification decision tree algorithm has several features such as pruning, unbiased splits, branches/splits, split type, user-specified priors, variable ranking, user-specified costs, missing values, and bagging and ensembles.

Historically regression tree algorithm AID was invented much earlier than THAID algorithm of classification tree. In case of regression decision tree algorithm, the variable Cl takes the ordered values instead of unordered values. Regression tree also has all the features similar to classification tree. The regression trees primarily have three advantages a) Unbiased splits; b) Each node contains a single regression model fit; c) Regression tree algorithms is stemming from the residuals, there are not many limitations for regression tree algorithms including general least squares.

Taking the decision to carry umbrella has several factors that can lead up to the decision-making. The weather factors whether it is a sunny day, rainy day, safeguard the suit or dress from getting wet, and inconvenience factor in case it does not rain to carry the umbrella can influence the decision to use the umbrella or not. I illustrated the decision tree diagram in PowerPoint to make the decision to take an umbrella or not. The decision tree flow shows that, in case an individual has made a decision not to carry an umbrella with him or her, and it turns out to be a sunny day, the best payoff value is shown as 1.00. The advantage of taking the umbrella is not to get wet in the rain. However, there is an inconvenient factor of carrying the umbrella without knowing if it is going to rain or not, but if it rains and the dress gets wet, the payoff value in this case can be equated to 0.00. The conditional probability of the sunshine can be denoted as pyoff.

The calculation for pays off to take an umbrella or not can be represented as

a) Take an umbrella — 0.8pyoff + 0.8(1-pyoff) = 0.8

b) Not to take an umbrella — 1.0pyoff + 0(1-pyoff) = pyoff.

In conclusion, the decision to take an umbrella is when pyoff < 0.80.

Both classification and regression decision trees have dependent variables and predictor variables. The predictor variables can have an ensemble of ordinal and nominal scales and the dependent variables are either quantitative or qualitative variables. The classification decision trees have categorical variables, whereas the regression decision trees have quantitative variables. The regression trees have parallel computing of regression and ANOVA modeling, and the classification can parallel compute the discriminant analysis

Four primary advantages of decision trees

i) The decision trees are available in several predictive analytics tools such as RapidMiner. In earlier review, I have described the features available in both classification and regression trees. Most of the predictive analytics tools perform the selection of the features and screening of the variables autonomously.

ii) Unlike the several algorithms that require extensive data preparation before performing analysis and applying the algorithms, the decision trees preparation of data does not require herculean efforts from the users of the data. For fitting a regression model or computing the coefficients the data has to be transformed to scale to the model. However, the decision trees do not require such transformations as the structure of the decision trees remain unchanged throughout the analysis. Applying any statistical methods usually tend to be erroneous when there are missing or null values. However, the null values or missing values do not impede the partitioning of the data for generating decision trees.

iii) Majority of the linear regression models tend to throw errors when there are nonlinear relationships between the variables. The decision trees do not expect the data to be linear.

iv) Another advantage of decision trees is the simplified models and reduction of the complexity. The visual diagram provides a simple explanation and easy interpretation to the executives.


Clemen, R. T., & Reilly, T. (2001). Making Hard Decision with Decision Tools Suite (1 ed.). Boston, MA: Thomson: Duxbury.

Deshpande, B. (2011). 2 main differences between classification and regression trees. Retrieved March 6, 2016, from

Deshpande, B. (2011). 4 key advantages of using decision trees for predictive analytics. Retrieved February 29, 2016 , from

Loh, W. (2011, January 6). Classification and regression trees. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1, 14–23.

Ritschard, G. (2010 ). CHAID and Earlier Supervised TreeMethods . Retrieved March 6, 2016 , from

Unesco (n.d.). Classification and Regression Trees. Retrieved March 1, 2016 , from

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.