I’ve been trying to understand gradient boosted trees on and off for the last few weeks, and it was a surprisingly frustrating exercise trying to find the answer to this question since introductions skimmed over it.

The general boosting algorithm is fairly intuitive and is covered in a lot of places, but the part that no-one seemed to explain was how to fit a single tree to minimize an arbitrary differentiable loss function.

The answer was to be found in Elements of Statistical Learning Second Edition on page 358-9 in sections Steepest Descent and Gradient Boosting (Note how it’s 22 pages into the chapter on Gradient…

Keras should be getting a transparent data-parallel multi-GPU training capability pretty soon now, but in the meantime I thought I would share some code I wrote a month ago for doing data-parallel training without making any changes to your model definition.

As a preface to this, I would like to note that your model may not run any faster on multiple GPUs if you are not actually GPU bound; some cases where this can happen include when you use a generator with your data and it‘s creation is CPU/IO bound, or if your model is not particularly complex and you are Memory-bound when transferring data to your GPU. …

If you’ve ever tried playing with Deep Learning, you’ll have found out that you’re not going to get very far without a GPU, and if you didn’t already happen to have an NVIDIA card, you’re going to be left with a choice between spending many hundreds of dollars up front or renting a GPU from a cloud provider.

Renting from a cloud provider will be more expensive in the long run, I’ve personally spent about a hundred dollars in spot instance costs so far, but if you’re not sure how long you’re going to stick with this and you’re not a gamer, the cloud can be a good way to try this out. …

