Unveiling Mathematics behind XGBoost

This article is targeted at people who found it difficult to grasp the original paper. Follow me till the end, and I assure you will atleast get a sense of what is happening underneath the revolutionary machine learning model. Let's get started.

Curious case of John and Emily

Gradient Boosting

Many implementations of Graident Boosting follow approach 1 to minimize the objective function. At every iteration, we fit a base learner to the negative gradient of the loss function and multiply our prediction with a constant and add it to the value from the previous iteration.

Source: Wikipedia


It was a result of research by Tianqi Chen, a Ph.D. student at the University of Washington. XGBoost is an ensemble additive model that is composed of several base learners.

Bullet Points to Remember

  1. ) While Gradient Boosting follows negative gradients to optimize the loss function, XGBoost uses Taylor expansion to calculate the value of the loss function for different base learners. Here is the best video on the internet that explains Taylor expansion.
  2. ) XGBoost doesn’t explore all possible tree structures but builds a tree greedily.
  3. ) XGBoost’s regularization term penalizes building complex tree with several leaf nodes.
  4. ) XGBoost has several other tricks under its sleeve like Column subsampling, shrinkage, splitting criteria, etc. I highly recommend continuing reading the original paper here.

Data Scientist at Symantec. I love learning and implementing Machine Learning.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store