Decision Tree Boosting Techniques compared

From Random Forest to LightGBM

Published in

DataSeries

6 min readAug 27, 2020

Decision Trees are popular Machine Learning algorithms used for both regression and classification tasks. Their popularity mainly arises from their interpretability and representability, as they mimic the way the human brain takes decisions.

However, to be interpretable, they pay a price in terms of prediction accuracy. To overcome this caveat, some techniques have been developed, with the goal of creating strong and robust models starting from ‘poor’ models. Those techniques are known as ‘ensemble’ methods (I discussed some of them in my previous article here).

In this article, I’m going to dwell on four different ensemble techniques, all having Decision Tree as base learner, with the aim of comparing their performances in terms of accuracy and training time. The four algorithms I’m going to use are:

Random Forest
Gradient Boosting
XGBoost
LightGBM

To compare these methods’ performances, I initialized an artificial dataset as follows:

from sklearn.datasets import make_blobs
from matplotlib import pyplot
from pandas import DataFrame
# generate 2d classification dataset
X, y = make_blobs(n_samples=10000, centers=3…

Decision Tree Boosting Techniques compared

From Random Forest to LightGBM

Written by Valentina Alto