Decision Tree Boosting Techniques compared

From Random Forest to LightGBM

Valentina Alto
DataSeries
Published in
6 min readAug 27, 2020

--

Decision Trees are popular Machine Learning algorithms used for both regression and classification tasks. Their popularity mainly arises from their interpretability and representability, as they mimic the way the human brain takes decisions.

However, to be interpretable, they pay a price in terms of prediction accuracy. To overcome this caveat, some techniques have been developed, with the goal of creating strong and robust models starting from ‘poor’ models. Those techniques are known as ‘ensemble’ methods (I discussed some of them in my previous article here).

In this article, I’m going to dwell on four different ensemble techniques, all having Decision Tree as base learner, with the aim of comparing their performances in terms of accuracy and training time. The four algorithms I’m going to use are:

  • Random Forest
  • Gradient Boosting
  • XGBoost
  • LightGBM

To compare these methods’ performances, I initialized an artificial dataset as follows:

from sklearn.datasets import make_blobs
from matplotlib import pyplot
from pandas import DataFrame
# generate 2d classification dataset
X, y = make_blobs(n_samples=10000, centers=3…

--

--

Valentina Alto
DataSeries

Data&AI Specialist at @Microsoft | MSc in Data Science | AI, Machine Learning and Running enthusiast