Member-only story
XGBoost: Theory and Hyperparameter Tuning
A complete guide with examples in Python
Introduction
In a few months, I will have been working as a Data Scientist for 3 years. I know it is not a long career yet, but together with my academic experience, I have been able to work on several machine learning projects for different sectors (energy, customer experience…). All of them were fed by tabular data, which means structured data (organised in rows and columns). In contrast, there are projects fed by unstructured data such as images or text which are more related to machine learning fields such as Computer Vision or Natural Language Processing (NLP).
Based on my experience, XGBoost usually performs well with tabular data projects. Although the No Free Lunch Theorem [1] states that any two algorithms are equivalent when their performances are averaged across all possible problems, on Bojan Tunguz’s Twitter [2] you can read frequent discussions with other professionals about why tree-based models (and specially XGBoost) are often the best candidates for tackling tabular data projects, even with the growing research into the use of Deep Learning techniques for this type of data. [3]
Also, it is quite funny to see how a Kaggle Grandmaster [4] jokes about being an XGBoost evangelist.