Source: Unsplash

XGBoost, LightGBM, and Other Kaggle Competition Favorites

An Intuitive Explanation and Exploration

Analytics Vidhya
Published in
8 min readSep 27, 2020

--

Kaggle is the data scientist’s go-to place for datasets, discussions, and perhaps most famously, competitions with prizes of tens of thousands of dollars to build the best model.

With all the flurried research and hype around deep learning, one would expect neural network solutions to dominate the leaderboards. It turns out, however, that neural networks — while indeed very powerful algorithms — have very limited applications, being useful really only in image recognition, language modelling, and occasionally sequence prediction.

Instead, top winners of Kaggle competitions routinely use gradient boosting. It’s worth looking at the intuition of this fascinating algorithm and why it has become so popular among Kaggle winners.

Decision trees are relatively weak on their own — predictions are formed based solely on yes/no questions. The feature space is split into hypercubes, which may not be appropriate for many datasets that cannot be separated by vertical and horizontal planes. However, ensembles of trees that combine learned insights from different models can be very powerful.

Traditional ensemble models — the Random Forest algorithm — form ensembles by training several trees on…

--

--