I’ve been trying to understand gradient boosted trees on and off for the last few weeks, and it was a surprisingly frustrating exercise trying to find the answer to this question since introductions skimmed over it.

The general boosting algorithm is fairly intuitive and is covered in a lot of places, but the part that no-one seemed to explain was how to fit a single tree to minimize an arbitrary differentiable loss function.

The answer was to be found in Elements of Statistical Learning Second Edition on page 358-9 in sections Steepest Descent and Gradient Boosting (Note how it’s 22…


Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store