XGBoost colsample_by* hyperparameters explained

MLComrade
Analytics Vidhya
Published in
3 min readJan 17, 2020

--

Hello comrades!

When I first stumbled upon XGBoost’s colsample hyperparameters I was a little bit confused how do they work together. Official documentation is good, but it took me some time to fully understand the difference between the parameters. This article assumes that you are familiar what XGBoost is all about and focuses on colsample_by* hyperparameters only.

Imagine we have a dataset that contains 16 features.

colsample_bytree

As we know, XGBoost builds multiple trees to make predictions. colsample_bytree defines what percentage of features ( columns ) will be used for building each tree. Obviously, the set of features for each tree is likely to be different ( can be the same due to a chance though, but it’s highly unlikely ). For simplicity let’s use 0.5 as the default value for all the parameters.

So we are about to build our first tree. Let’s start with the root node. But first we need to filter random features for our tree.

colsample_bylevel

This comes into play every time when we achieve the new level of depth in a tree. Before making any further splits we take all the features that are left after applying colsample_bytree and filter them again using colsample_bylevel. On the next level of depth we repeat this step, so you get different set of features on each level.

colsample_bynode

The final possible step of choosing features is when we set colsample_bynode hyperparameter. Before making the next split we filter all the features left after applying colsample_bylevel. We choose features for each split on the same level of depth separately.

Just as the official documentation states:

colsample_by* parameters work cumulatively. For instance, the combination {'colsample_bytree':0.5, 'colsample_bylevel':0.5, 'colsample_bynode':0.5} with 64 features will leave 8 features to choose from at each split.

Why do I need these parameters?

By limiting the number of features for building each tree we may end up with trees that gained different insights from the data. They learn how to optimise for the target variable using different set of features. So if you have enough data you can try tuning colsample parameters!

Hope the article was helpful for you!

--

--