Decision Tree Regression

- A quickie: All you need to know about Decision Tree Regression as a beginner

Jayesh
3 min readAug 7, 2020

--

You might have come across Linear Regression , Multiple Regression and Polynomial Regression Models. Furthermore, we have many other Regression models in Machine Learning, one of which is Decision Tree Regression. The decision tree model is very good at handling tabular data with numerical features, or categorical features with fewer than hundreds of categories. Unlike linear models, decision trees are able to capture non-linear interaction between the features and the target.

Here x1 and x2 are independent variables and y is called the dependent variable. For example, if we want to predict the housing costs of different localities, we may take x1 as the number of bedrooms and x2 as how old the house is in years. These two dependent variables are considered to affect the pricing, “y”.

The following scatter plot shows the 2D plot of x1 and x2 and y is a third dimension, say pricing. In Decision Tree Regression first we sub-divide the plot into several splits virtually called leaves. Thus forming the so-called leaves of the tree.

Decision Tree Regression

The number of splits are decided by the algorithm taking into account the value of information added by making the split. Thus, a decision tree is made as follows:

Decision Tree

The average of each split is taken and it assigned to each split called as the Terminal Leaf of the Decision Tree. By dividing the Data into relevant splits the machine learning algorithm can more accurately predict the value of the independent variable.

A practical application for example predicting the salary of an employee can be done on Python. Taking Salary and Position Level data as dependent variables, we can calculate the predicated salary of any employee given his Position Level using Decision Tree Regression Model.

The plot shows a constant value for the number of intervals which fall in a particular split. An interesting fact to note is the intuition you get when a number of such Decision Tree are simultaneously deployed. This is what is called a Random Forest Regression Model.

Written By:

Jayesh Kumar

3rd year, ECE

--

--