A Beginner’s Guide to CatBoost with Python

Is CatBoost really about cats?

Fernando Delgado
8 min readJun 12, 2022
Image by Александар Цветановић via Pexels.

Sadly, no. CatBoost is not about cats, but there’s nothing wrong in imagining a team of cats training your machine learning model, right?

If you’ve recently browsed into Kaggle, you probably noticed people participating in competitions using CatBoost and achieving scores that outperform traditional models. But, what is CatBoost exactly? How does it work?

From its official documentation definition, CatBoost is an algorithm for gradient boosting on decision trees, developed by Yandex as an open-source library in 2017. It manages categorical features by allowing us to introduce non-numeric factors and hence the name CatBoost (Categorical Boosting). Furthermore, it is used for search engines, recommendation systems, personal assistants, self-driving cars, weather prediction, and many other tasks at companies like Yandex, CERN, Cloudflare, and Careem taxi.

With that in mind, through the following we will review:

Decision Trees and Gradient Boosting

--

--