XGBoost Algorithm: The New King

5 min readJun 19, 2020

The new King of Machine Learning algorithms taking over the world…

The era of regression modeling is over. The old king has passed. Long live the new King with a crazy name; XGBoost or Extreme Gradient Boosting!

What is XGBoost?

XGBoost is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient boosting framework. In prediction problems involving unstructured data (images, text, etc.) artificial neural networks tend to outperform all other algorithms. However, when it comes to small-to-medium structured data, decision tree-based algorithms are considered best right now. Please see the chart below for the evolution of tree-based algorithms over the years.

Evolution of XGBoost Algorithm from Decision Trees

XGBoost algorithm was developed as a research project at the University of Washington. Since its introduction, this algorithm has not only been credited with winning numerous Kaggle competitions but also for being the driving force for several cutting-edge industry applications. As a result, there is a strong community of data scientists contributing to the XGBoost open source projects with nearly 350 contributors and about 3,600 commits on GitHub. The algorithm differentiates itself in the following ways:

A wide range of applications like solving regression, classification, ranking, and user-defined prediction problems.
Portability: Runs smoothly on Windows, Linux, and OS X.
Supports all major programming languages including C++, Python, R, Java, and Scala.
Cloud Integration: Supports AWS, Azure, and works well with Flink, Spark, and other ecosystems.

Build an intuition for XGBoost

Decision trees are easy-to-visualize and fairly interpretable algorithms but building intuition for the next-generation of tree-based algorithms can be a bit tricky. Let’s have a simple analogy to better understand the evolution of tree-based algorithms.

Imagine that you are a hiring manager interviewing several candidates with excellent qualifications. Each step of the evolution of tree-based algorithms can be viewed as a version of the interview process.

Decision Tree: Every hiring manager has a set of criteria such as education level, number of years of experience, interview performance. A decision tree is analogous to a hiring manager interviewing candidates based on his or her own criteria.
Bagging: Now imagine instead of a single interviewer, now there is an interview panel where each interviewer has a vote. Bagging or bootstrap aggregating involves combining inputs from all interviewers for the final decision through a democratic voting process.
Random Forest: It is a bagging-based algorithm with a key difference wherein only a subset of features is selected at random. In other words, every interviewer will only test the interviewee on certain randomly selected qualifications (e.g. a technical interview for testing programming skills and a behavioral interview for evaluating non-technical skills).
Gradient Boosting: A special case of boosting where errors are minimized by gradient descent algorithm e.g. the strategy consulting firms leverage by using case interviews to weed out less qualified candidates.
XGBoost: Think of XGBoost as gradient boosting on ‘steroids’. It is a perfect combination of software and hardware optimization techniques to yield superior results using less computing resources in the shortest amount of time.

Why does XGBoost perform so well?

XGBoost and Gradient Boosting Machines are both ensemble tree methods that apply the principle of boosting weak learners using gradient descent architecture. However, XGBoost improves upon the base GBM framework through systems optimization and algorithmic enhancements.

XGBoost optimizes standard GBM algorithm

System Optimization:

Parallelization: XGBoost approaches the process of sequential tree building using a parallelized implementation. This is possible due to the interchangeable nature of loops used for building base learners; the outer loop that enumerates the leaf nodes of a tree, and the second inner loop that calculates the features. This loop nesting limits parallelization because without completing the inner loop, the outer loop cannot be started. This switch improves algorithmic performance by offsetting any parallelization overheads in computation.
Tree Pruning: The stopping criterion for tree splitting within the GBM framework is greedy in nature and depends on the negative loss criterion at the point of the split. XGBoost uses ‘max_depth’ parameter as specified instead of criterion first, and starts pruning trees backward.
Hardware Optimization: This algorithm makes efficient use of hardware resources. This is accomplished by cache awareness by allocating internal buffers in each thread to store gradient statistics.

Algorithmic Enhancements:

Regularization: It penalizes more complex models through both LASSO (L1) and Ridge (L2) regularization to prevent overfitting.
Weighted Quantile Sketch: XGBoost employs the distributed weighted Quantile Sketch algorithm to effectively find the optimal split points among weighted datasets.
Cross-validation: The algorithm comes with a built-in cross-validation method at each iteration, taking away the need to explicitly program this search and to specify the exact number of boosting iterations required in a single run.

So should we use just XGBoost all the time?

As Data Scientists, we must test all possible algorithms for data at hand to identify the champion algorithm. Besides, picking the right algorithm is not enough. We must also choose the right configuration of the algorithm for a dataset by tuning the hyper-parameters. There are several other considerations for choosing the winning algorithm such as computational complexity, explainability, and ease of implementation. This is exactly the point where Machine Learning starts drifting away from science towards art, and here the change happens!

Future Algorithms

Machine Learning is a very active research area and already there are several viable alternatives to XGBoost. Microsoft Research recently released the LightGBM framework for gradient boosting that shows great potential. It is a matter of time when we have a better model framework that beats XGBoost in terms of prediction performance, flexibility, and explainability. However, until a time when a strong challenger comes along, XGBoost will continue to reign over the Machine Learning world!

Thanks for the read.