Gradient Boosting

gobind agarwal
2 min readSep 25, 2019

Name any Kaggle competition winner,who has not used Gradient Boosting at least once for getting a high score. Everyone uses one or the other form of Gradient Boosting method to train their model so that it predicts good for the unseen data.

Anyone can use this method with the simple use of libraries, but the parameters involved within the method can not be tuned without knowing what’s going on inside the algorithm.

Before we dive in to the topic of Gradient Boosting, let’s understand what is Bagging and Boosting.

Bagging is something like “divide and conquer and combine”, we first divide the whole training data set into many small sample training data set . Each such samples are taken and model is build. At last we combine all the models to obtain the best model for the data set as whole. For combining we can use average,weighted average, Voting etc. Techniques.

Boosting is sequential, where we take a weak learner (training error is less than 50%) which is just better than any random learner. In each iteration we train that weak learner and for the data points which were predicted with an error, their weights are increased with respect to the data points which were predicted correctly. This is just like saying we penalise for mistake and reward for being correct. Model learns from its mistake.

Gradient Boosting as the name suggest comes under Boosting algorithm.

For Example, we have 80 positive data points and 20 negative data points. Our weak learner is taken in such a way that it predicts each and every data point as positive. So, the error is only 0.2. Now we should decrease the weights of every positive data points and increase weights of every negative data points which was predicted incorrectly.

Every model is trained with the goal of minimising the error. So at each iteration we calculate the error of actual value v/s predicted value. But at the same time we need to consistently check for over fitting problem. With every iteration we take a step towards decreasing prediction error. Gradient of errors are taken for prediction, thus the name gradient boosting.

--

--