Binary Decision Trees

Published in

Analytics Vidhya

4 min readMay 15, 2020

Regression Trees

Introduction

Binary decision trees is a supervised machine-learning technique operates by subjecting attributes to a series of binary (yes/no) decisions. Each decision leads to one of two possibilities. Each decision leads to another decision or it leads to prediction. An example of a trained tree will help cement the idea. You’ll learn how training works after understanding the result of training. Decision Trees are used for both Regression and Classification machine learning problems. The term Classification And Regression Tree (CART) analysis is an umbrella term used to refer to both processes.

Read dataset.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1599 entries, 0 to 1598
Data columns (total 12 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   fixed acidity         1599 non-null   float64
 1   volatile acidity      1599 non-null   float64
 2   citric acid           1599 non-null   float64
 3   residual sugar        1599 non-null   float64
 4   chlorides             1599 non-null   float64
 5   free sulfur dioxide   1599 non-null   float64
 6   total sulfur dioxide  1599 non-null   float64
 7   density               1599 non-null   float64
 8   pH                    1599 non-null   float64
 9   sulphates             1599 non-null   float64
 10  alcohol               1599 non-null   float64
 11  quality               1599 non-null   int64  
dtypes: float64(11), int64(1)
memory usage: 150.0 KB

Tip We are not going to test the algorithm, we aim only to illustrate how it works. So we will use the whole dataset without being divided into training and testing datasets.

Train the model

Get our tree visualized.

The above Figure shows the series of decisions produced as an outcome of the training on the wine quality data. The block diagram of the trained tree shows a number of boxes, which are called nodes in decision tree parlance. There are two types of nodes: Nodes can either pose a yes/no question of the data, or they can be terminal nodes that assign a prediction to examples that end up in them. Terminal nodes are often referred to as leaf nodes. The terminal nodes are the nodes at the bottom of the figure that have no branches or further decision nodes below them.

How a Binary Decision Tree Generates Predictions?

How to Determine Split-Point

Using Variance reduction (Mean square error)

That question of how the split point is determined.? The process is to try every possible split point till get best one as follow.

As per above tree-plot, column no.(11) has the most significant influence on our target. Why — -> Will be illustrated below
Value (10.525) in the above graph’s root node represent the value at which minimum value of Mean Squared Error "MSE" is accomplished as following:

Sort our feature elements condescendingly.
Get average value between first 2-points in our feature and define it as our threshold.
Get corresponding average output values one for the targets before threshold and the other for target values next to the threshold. Theses average values represent 2-predicted values.
4. Calculate Mean Square Error with reference to each average (mean value).
Repeat points 2–4 but with get new threshold average value between 2nd and 3rd feature points and so on till reach the last 2-points in our feature.
Threshold value in our feature that, achieve the Least MSE for our target.

So, by dividing dataset around split value as below to get d1 and d2 datasets:

Notice max value of [X10] “alcohol” in d1 is 10.50 and min value of “alcohol” in d2 is 10.55
Calculate the average of the above 2-values will result in: 10.525 which represents the our split point of root node.