Sitemap
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Follow publication

Member-only story

Visualizing XGBoost Parameters: A Data Scientist’s Guide To Better Models

8 min readJan 15, 2025

--

Illustration that shows the random sub-sampling of datapoints depending on the fraction of features selected in colsample_bytree.
Image by the Author.

After leaving neuroscience behind and commencing on a data science path a number of years ago, I’ve had the privilege of working on numerous real-world machine learning projects. One thing that stands out across industries and use cases — whether it’s predicting customer churn, forecasting sales, or optimizing supply chains — is how often XGBoost dominates when working with tabular data.

Its ability to handle missing values, apply regularization, and consistently deliver strong performance has really solidified its place in the data scientist’s toolkit. Even with the rise of newer algorithms, including neural networks, XGBoost still stands out as a go-to choice for production systems dealing with structured datasets.

What I find most impressive, though, is the level of control it offers through its parameters — they’re like the secret levers that unlock performance, balance complexity, and even make models more interpretable. Yet, I’ve often noticed that while XGBoost is widely used, its parameters are sometimes treated like a black box, with their full potential left untapped. Understanding these parameters and how they can contribute to better…

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Thomas A Dorfer
Thomas A Dorfer

Written by Thomas A Dorfer

Senior Data Scientist @ BCG. I mainly write about data science and technology.