The Math Behind XGBoost

Building XGBoost from Scratch Using Python

17 min readJan 10, 2024

Introduction

Gradient Boosting is a powerful and versatile machine learning technique that has gained immense popularity in solving complex predictive tasks. This method stands out for its effectiveness in handling a variety of data types, robustness against overfitting, and exceptional performance in both regression and classification tasks. The core idea behind Gradient Boosting is to sequentially build an ensemble of weak learners — typically decision trees — to create a model that outperforms any of the individual learners.

Gradient Boosting Framework

Let’s first outline the high level framework, and then dive deep in each of these steps:

1. Initialization

Start with a Base Model: Typically, the model starts with a simple prediction for all instances, like the mean (for regression) or mode (for classification) of the target variable.
Initial Prediction: This initial prediction serves as the starting point, and the algorithm will iteratively improve upon it.

2. Iterative Improvement

Sequentially Add Weak Learners: The core idea is to add weak learners (usually decision trees)…