Boosting with AdaBoost and Gradient Boosting
--
Have you ever been or seen a Kaggle competition? Most of the prize winners do it by using boosting algorithms. Why is AdaBoost, GBM, and XGBoost the go-to algorithm of champions?
First of all, if you never heard of Ensemble Learning or Boosting check out my post “Ensemble Learning: When everybody takes a guess…I guess!” so you can understand better these algorithms.
More informed? Good, let’s start!
So, the idea of Boosting just as well any other ensemble algorithm is to combine several weak learners into a stronger one. The general idea of Boosting algorithms is to try predictors sequentially, where each subsequent model attempts to fix the errors of its predecessor.
Adaptive Boosting
Adaptive Boosting, or most commonly known AdaBoost, is a Boosting algorithm. Shocker! The method this algorithm uses to correct its predecessor is by paying more attention to underfitted training instances by the previous model. Hence, at every new predictor the focus will be, each time, on the harder cases.
Let’ts take the example of the image. To build a AdaBoost classifier, imagine that as a first base classifier we train a Decision Tree algorithm to make predictions on our training data. Now, following the methodology of AdaBoost, the weight of the misclassified training instances is increased. The second classifier is trained and acknowledges the updated weights and it repeats the procedure over and over again.
At the end of every model prediction we end up boosting the weights of the misclassified instances so that the next model does a better job on them, and so on.
This sequential learning technique sounds a bit like Gradient Descent, except that instead of tweaking a single predictor’s parameter to minimise the cost function…