Yandex CatBoost is a Godsend.
Working on an intro to data science workshop for web developers, I am worried that the web devs will be scared away due to the immense and hair-tearing efforts needed in data-munging…
And then CatBoost happened.
What is CatBoost?
CatBoost is an open-source gradient boosting on decision trees library with categorical features support out of the box for Python and R. (from GH repo)
Simply put, it’s a plug-and-play classifier in scikit-learn’s convention that would deal with categorical features automatically for you. Say bye to the days of getting dummies and scratching your head over what to do with text features.
All you need to change from your scikit-learn routine is this…
Checkout this Kaggle kernel to see it in action.