How to use Bagging Technique for Ensemble Algorithms — A code exercise on Decision Trees

Rohit Madan
Analytics Vidhya
Published in
3 min readNov 19, 2019

If you haven’t gathered what this article is all about by looking at the title, pardon me because I probably need a more concise set of sentences and a new vocabulary.

This article is going to cover one of the most important techniques of Ensemble learning known as Bootstrap Aggregating aka Bagging.

This is not a frodo baggins tale mind you.

The baggins
THE BAGGING TECHNIQUE (SOURCE — Aurelion Geron)

See there’s a big difference, but I will be your sam here to explain it to you.

Ensemble learning has two main techniques, one is Bagging and the other is Pasting.

Bagging is more commonly used because it results in better models and we shall cover Bagging only in this article.

What is Bagging Technique?

Ensemble technique is

  1. A way to use different classifiers or regressors on same training dataset using different algorithms.
  2. Or a way to use same training algorithm on different predictors using subsets of training algorithms (Random instances of training data).

When we sample the dataset if we provide a replacement of training data we call it Bagging and w/o providing replacement is called pasting.

Meaning when we sample training data to make subsets of datasets and train predictors using these subsets again and again repeating the sampling exercise for same predictors it’s bagging.

Examples

Imagine having the moons dataset, now we will take 3 predictors say Logistic Regression, SVM and Decision Trees and run on 10,000 samples of moons dataset. Then when we use all training data to get one aggregated result it’s called Ensemble learning(1).

When we make 500 subsets of the 10,000 samples and randomly pick subsets to train predictors again and again, then aggregating the result is called bagging.

Got it so far, great stay with me now.

There are two ways to get the results of aggregation -

  1. Soft voting
  2. Hard voting

Principle difference btw them — if the predictors used have a predict_proba() class or if the predictors which will be aggregated to get a result use probability of each class to aggregate a result then it’s the soft voting otherwise it’s hard voting.

In my opinion soft voting is better i.e. using predict_proba() is better compared to hard voting.

Now that we know Bagging, both voting categories, let’s use a code to demonstrate bagging technique.

bag_clf = BaggingClassifier(
DecisionTreeClassifier(random_state=42), n_estimators=500,
max_samples=100, bootstrap=True, random_state=42)
bag_clf.fit(X_train, y_train)y_pred = bag_clf.predict(X_test)

We’re using 500 decision trees here with subset of 100 samples in each subset.

Bootstrap = True means Bagging and = False means Pasting.

Accuracy score

0.904

Which is not great but had we not done this bagging technique and used decision trees only to make predictions our accuracy score would have been

0.856

so definitely bagging helps.

Note (Extra info)— Using ensemble technique on decision trees using bagging is called Random Forest but also note that max samples are set as entire training sets i.e. trees as big as entire dataset.

There is an excellent diagram in Aurelion Geron’s book which explains the difference between using bagging and not using it clear.

Figure on left shows decision trees predictions vs Bagging predictions, see how smooth the curve on the right is? and how overfit the figure on left is.

If you want to have a peak at the source code-

If you have feedback or want to collaborate, write to me @ rohitmadan16@gmail.com

--

--