Decision Tree Boosting Techniques compared
From Random Forest to LightGBM
Decision Trees are popular Machine Learning algorithms used for both regression and classification tasks. Their popularity mainly arises from their interpretability and representability, as they mimic the way the human brain takes decisions.
However, to be interpretable, they pay a price in terms of prediction accuracy. To overcome this caveat, some techniques have been developed, with the goal of creating strong and robust models starting from ‘poor’ models. Those techniques are known as ‘ensemble’ methods (I discussed some of them in my previous article here).
In this article, I’m going to dwell on four different ensemble techniques, all having Decision Tree as base learner, with the aim of comparing their performances in terms of accuracy and training time. The four algorithms I’m going to use are:
- Random Forest
- Gradient Boosting
- XGBoost
- LightGBM
To compare these methods’ performances, I initialized an artificial dataset as follows:
from sklearn.datasets import make_blobs
from matplotlib import pyplot
from pandas import DataFrame
# generate 2d classification dataset
X, y = make_blobs(n_samples=10000, centers=3…