Published in

Analytics Vidhya

How does the mathematics behind the Decision tree work for Regression problems? Math Time!!!

Decision tree — Part 2

The previous article discussed the splitting of nodes for classification problems. Now, lets look into the regression problems where the target values are continuous.

Reduction of Variance

Variance of a node is calculated:

where X = Labels in the feature, D= Total no. of observations,
n = No of instances

The feature which gives the least value for the split will be considered.

Lets look into an example for better understanding:

Regression tree with continuous target variables and categorical feature variables.

• Outlook column is considered:

For weights:

1. Sunny = 3/10 = 0.3
2. Overcast = 3/10 = 0.3
3. Rainy = 4/10 =0.4

For variance

1. Sunny

Mean value is calculated for Hours_played =( 6 + 10 + 20 )/3 = 12

Apply variance formula:

var(X,D) = [(6–12)² + (10–12)² +(20–12)² ] / (3–1) = 52

then it is multiplied with the weights: 52 x 0.3 =15.6

2. Overcast

Mean value = (8 + 3+ 12)/3 =7.6

var(X,D) = [(8–7.6)² + (3–7.6)² +(12–7.6)² ] / (3–1) = 20.34

After multiplying with weights: 20.34 x 0.3 = 6.102

3. Rainy

Mean value = (2 + 14+ 19 + 8)/4 =10.75

var(X,D) =
[(2–10.75)² + (14–10.75)² +(19–10.75)² +(8–10.75)² ] / (4–1) = 53.91

After multiplying with weights: 53.91 x 0.4 = 21.564

The sum of all values is the variance of the column:
15.6 + 6.102 + 21.564 = 43.27

• Humidity column is considered and variance is calculated in the similar fashion:

Variance of humidity = 13.16 + 27.78 = 41.54

Comparison of Variance values:

The variance of Humidity is less hence it is considered for the split.

The decision tree is constructed and the number of hours played is depicted.

( Humidity = High & Seasons = Sunny ) = 6

( Humidity = High & Seasons = Overcast ) = (8+12)/2 = 10

( Humidity = High & Seasons = Rainy ) = 19

( Humidity = Low & Seasons = Sunny ) = (10+20)/2 =15

( Humidity = Low & Seasons = Overcast ) = 3

( Humidity = Low & Seasons = Rainy ) = (2+14+8)/2 = 12

--

--

More from Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Varsha C Bendre

AI | Machine Learning | Mathematics | Physics