Boosting and Bagging explained with examples !!!

Published in

The Startup

4 min readJun 22, 2019

In case you want to know more about the ensemble model, the important techniques of ensemble models: bagging and boosting. Please go through my previous story (part- 1) in the link below.

Difference between Bagging and Boosting?

Bagging and boosting are commonly used terms by various data enthusiasts around the world. But what exactly bagging and…

medium.com

Boosting

Boosting algorithms is the family of algorithms that combine weak learners into a strong learner.

Main Steps involved in boosting are :

Train model A on the whole set
Train the model B with exaggerated data on the regions in which A performs poorly
…

Instead of training the models in parallel, we can train them sequentially. This is the main idea of Boosting!

Idea behind boosting algorithms?

The idea behind boosting algorithms is to learn weak classifiers that are only slightly correlated with the true classification, they combine them into a strong classifier that is well-correlated with the true classification.

In order to provide you with an idea about mail spam detection problem. The spam detection problem can be divided into the following steps.

Detect if an email contains ‘how have earned the prize’?
If the email contains only an image?
Who is the sender?
How caps lock was used in the email?
Check the subject line in the email

All these steps are the weak classifiers to detect spam. Individually they cannot answer the question if the email is spam or not. However, when they are used together, they can detect spam with high probability and accuracy.

Working of boosting algorithm?

Boosting algorithm iteratively learns weak classifiers and add them to a final strong classifier. The added weak classifiers are usually weighted according to their accuracy. After each iteration, the training data is reweighted in a way that misclassified instances gain weight and correctly classified instances lose weight. On the next iteration, the future weak learner concentrates mostly on wrongly classified instances. The boosting algorithms mostly differ in reweighting approach applied to the training set.

Common Boosting algorithms:

AdaBoost
GBM
XGBM
Light GBM
CatBoost

Bagging( Bootstrap Aggregating)

As we discussed before bagging is an ensemble technique mainly used to reduce the variance of our predictions by combining the result of multiple classifiers modelled on different sub-samples of the same data set.

Main Steps involved in bagging are :

Creating multiple datasets: Sampling is done with a replacement on the original data set and new datasets are formed from the original dataset.
Building multiple classifiers: On each of these smaller datasets, a classifier is built, usually, the same classifier is built on all the datasets.
Combining Classifiers: The predictions of all the individual classifiers are now combined to give a better classifier, usually with very less variance compared to before.

Bagging is similar to Divide and conquer. It is a group of predictive models run on multiple subsets from the original dataset combined together to achieve better accuracy and model stability.

Bagging Sampling Example

N = {18,20,24,30,34,95,62,21,14,58,26,19} — Original sample with 12 elements

Bootstrap sample A: {20, 34, 58, 24, 95,18}

Bootstrap sample B: {62, 21, 19, 30, 14,26}

Bootstrap sample C: {58, 24, 18, 24, 34,20}

Why do we create bootstrap samples? Bootstrap samples are created to estimate and validate models for improved accuracy, reduced variance and bias, and improved stability of a model.

Once bootstrap samples are created, a model classifier is used for training or building a model and then selecting model based on popularity votes. In the classification model, a label with maximum votes will be assigned to the observations. The average value is used in case of a regression model.

To explain the basic scenario of bagging. Below is an analysis of the relationship between ozone and temperature (taken from Wikipedia)

The relationship between temperature and ozone in this data set is apparently non-linear, based on the scatter plot. Instead of building a single prediction model from the complete data set, 100 samples of the data were drawn. Each sample is different from the original data set, yet resembles it in distribution and variability. Predictions from these 100 were then samples made across the range of the data. The first 10 predicted smooth fits appear as grey lines in the figure below. The lines are clearly very wiggly and they overfit the data — a result of the bandwidth being too small.

By taking the average of 100 smoothers, each fitted to a subset of the original data set, we arrive at one bagged predictor (red line). Clearly, the mean is more stable and there is less overfit.

Common Bagging algorithms:

Bagging meta-estimator
Random forest

I hope this article would have given you a solid understanding of this topic. If you have any suggestions or questions, do share in the comment section below.

Thanks for reading! ❤