Introducing Ensemble: More is better than One!

Jocelyn D'Souza

Published in

GreyAtom

4 min readMar 17, 2018

This post will guide you through the basics of ensemble methods in Machine Learning.

United We Stand, Divided We Fall

Source: Khar Road Railway Station in Mumbai

I’m sure you have learned this quote in school through a famous fable about an old farmer and this five sons. I know it must be quite a long time when you read about it. Don’t worry, taking you back to the short version of this fable.

Once there was an old farmer. He had five sons. They were very selfish. They always quarreled with one another. He was worried about them.
When he was on his deathbed, he wanted to teach them a lesson. He advised them to live in unity. But they did not care.
He asked his servant to bring a bundle of sticks. Then he called his sons one by one and asked them to break the bundle. But no one could do that. Then he ordered the servant to untie the bundle.
Now each one of them could break the sticks easily. He advised his sons to live like a bundle of sticks. If they quarreled, the people would harm them. The sons promised to live united and learned a big lesson.

As you read, you learn that an individual stick is very weak to withstand any force. But, if we collect all these weak sticks together to form a bundle, it becomes very strong to withstand any force.

There you go, This is the most basic intuition about ensemble methods in Machine Learning! :)

What is Ensemble?

Ensemble methods use multiple learning algorithms to obtain better predictive performance. We train various different models, aggregating their predictions to improvise stability and predictive power as shown below:

As we see, we need numbers of models(learners), whose predictive power is just slightly better than random chance. Such learners are called as weak learners. We call a machine learning model a strong learner whose predictive power is almost accurate. We take such weak learners to make one combined strong learner.

Why multiple weak learners make a difference?

Who Wants to Be a Millionaire? is a reality show where a contestant is asked a question with four options and he has to answer the correct option. In case, he is unable to answer he is presented with some lifeline such as Phone a Friend, Audience Poll, etc.

If you have been following the show, you will know that Phone a Friend is sometimes wrong but Audience Poll will almost never be wrong. It turns out to be a collection of hundreds of non-experts predicts far better than the one expert during Phone a Friend.

This is the exactly what ensembles do, they take a collection of weak learners and try to combine to make the model’s predictive power more powerful.

What are the different Ensemble Methods?

Hard Voting & Soft Voting

Let’s assume that there are five models in an ensemble and the model predicts the following probabilities for a given point:

0.45 | 0.40 | 0.65 | 0.58 | 0.45

In hard voting, the voting classifier takes the majority of its base learners’ predictions as the final prediction i.e 0.45 which will be a negative class.
In soft voting, the voting classifier takes into account the average probability values by its base learners as the final prediction i.e. 0.51 which will be a positive class.

Bagging

Bagging stands for Bootstrap Aggregating. In bagging, we train each base learner on a different sample of data. Here, the sampling of data points happens with replacement. The process of sampling with replacement is called Bootstrapping.

Pasting

Just as in bagging we create samples through repeated re-sampling with replacement, we can create samples without replacement for each base learner. Ensemble on such samples is known as Pasting.

Out of Bag Evaluation

In bootstrapping, a sampling of data points happens with replacement and around 1/3rd of the original sample will end up not being selected. There is no need for a separate validation set or cross-validation since this unselected sample is not in the bag. Thus, this validation is called out-of-bag evaluation.

Stacking

In stacking, we combine multiple models via a meta-classifier. The individual models are trained on the training data set. The outputs of each individual models are then fed as input to the meta-classifier.

In this post, I’ve taken you through the basics of ensemble in machine learning and different types of ensemble methods.

Thanks for reading! ❤

Follow for more updates!