Bagging the skill of Bagging(Bootstrap aggregating).

Harish Kandan
2 min readJan 14, 2018

--

Perhaps I should start with an apology for the terrible pun eh?

Anyway, Bagging is one of the two major sub-classes of ensemble machine learning techniques(I shall explain ensembling in this context is in a while now), with the other being Boosting. It can be used for both classification and regression. Usually used with decision trees, it improves the stability of a model by improving accuracy and reducing variance, thus reducing the problem of overfitting. Seems interesting right? Let’s dive into the details then.

Bagging is short for “Bootstrap aggregating”. It’s a sub-class of ensemble machine learning algorithms wherein we use multiple weak models and aggregate the predictions we get from each of them to get the final prediction. The weak models should be such that each specialize in a particular part of the feature space thus enabling us to leverage predictions from each model to maximum use. As suggested by the name, it consists of two parts, bootstrapping and aggregation.

Bootstrapping

Bootstrapping is a sampling technique. Out of the n samples available, k samples are chosen with replacement. We then run our learning algorithm on each of these samples. The point of sampling with replacement is to make the re-sampling truly random. If done without replacement, the samples drawn will be dependent on the previous ones and thus not be random.

Aggregating

As simple as the name suggests. The predictions from the above models are aggregated to make a final combined prediction. This aggregation can be done on the basis of predictions made or the probability of the predictions made by the bootstrapped individual models.

Advantages

Bagging takes the advantage of ensemble learning wherein multiple weak learner outperform a single strong learner. It helps reduce variance and thus helps us avoid overfitting.

Disadvantages

There is loss of interpretability of the model. There can possibly be a problem of high bias if not modeled properly. Another important disadvantage is that while bagging gives us more accuracy, it is computationally expensive and may not be desirable depending on the use case.

There are many bagging algorithms of which perhaps the most prominent would be Random Forest. I’d hopefully be writing a blog about it in the future.

Hope you enjoyed reading this. Happy learning!

--

--