Understanding Ensemble Methods

Queen
ml-concepts.com
Published in
7 min readFeb 25, 2022

If explain to me like I am five is an article titled, “ Understanding Ensemble Methods”. This is it! A fancy way of letting you know a beginner will find this quite easy to comprehend.

Introduction

Sometime around October 2020, my favorite cousin, Ola, was going to be a year older and I planned to get her a surprise gift. I had a couple of ideas, but none of these ideas seemed right. Hence, without variance, I decided to gather ideas from a few of our mutual friends and after combining all of their different ideas, I eventually decided to surprise her with a cake, due to the data I collected from the multiple sources which in this case is our mutual friends. I couldn’t have predicted better, Ola loved it.

You are probably wondering if you are reading the right article, or wondering how my surprising cousin relates with ‘Ensemble learning’.

Photo from highlander

Yes! You Are reading the right article and no! Ensemble learning has nothing to do with Ola. But, the method I applied in coming up with a perfect surprise gift is what Ensemble learning is about.

Sometimes, when you need to get a bag or shoe from your favorite online or offline store, the inquiries you make from your friends or public about the price, the worth, user experience is an ‘Ensemble method’. If 2 heads can be better than one, imagine multiple heads!

In a technical view, Ensemble learning will be defined as the process of combining multiple models or learning algorithms to produce a higher predictive model.

Technically,

The multiple models used in Ensemble learning can be classifiers or experts. The first example I gave in this article is a perfect application of the classifier algorithm, an algorithm that specifically categorizes input data. For every classification problem, there is an appropriate classifier depending on the type of classifier model chosen, such as decision trees, Logistic regression, multilayer perceptron(MLP), Naive Bayes Classifier, e.t.c.

In Ensemble Modeling, the combination of multiple models might not always enact better than the best individual model. However, it lowers the risk of making a poor choice, as it facilitates more accuracy.

Photo from Analytics India magazine

Occasionally, some problems might be too complex to be solved by a certain individual model, because some classifiers model one aspect of data, while others are better at modeling other parts. In this case, the combination of different individual models is used.

Ensemble techniques can also be quite useful when there is a lack of adequate data or the availability of large volumes of data. In a case where there is abundant availability of datasets, making the model training difficult, the dataset can be divided into smaller subsets, Each division is used to train different individual models separately, which can later be combined appropriately.

On the other hand, in the case of inadequate data, bootstrapping is the perfect option. You don’t know what bootstrapping is?

Wait! We’ll get there.

Let’s talk about Ensembling Methods

  • Voting based Ensemble Method :

This is one of the easiest methods to implement. Voting, in this sense, does not have a different technical meaning. It is a method of classification, either binary or multiple classifications. This method is further classified into Majority voting and weighted voting.

https://imageio.forbes.com/specials-images/dam/imageserve/1063184564/0x0.jpg?format=jpg&width=1200&fit=bounds
  1. Majority voting: Similarly to electoral processes, either for the position of a class governor, a state government, or even the office of the president of a country. The rules don’t change. The winning party or candidate is the one with the higher percentage of votes, the majority vote. However, in ensemble modeling, every individual model votes and there’s a winning party according to the majority rule. The model with the majority votes.
  2. Weighted Voting: Unlike majority voting where there is equal power and electoral rights, this is a different case. Imagine a condition, where to vote for a class president, the votes of the prime students in the class are worth more than others because of their academic prowess. Well, it doesn’t sound fair because you are human. Models don’t feel the same way. In weighted voting, the better models are the prime students, their prediction gets counted multiple times.
  • Averaging Based Ensemble Methods:

This is also an easy method in the ensemble technique. Before we can calculate an average, there have to be multiple classifications using some training data set. Each model can be built using the same or different algorithm. This can also be classified into simple averaging and Weighted averaging. The simple averaging method reduces overfitting (good performance on the training data, bad performance on the test data, and other datasets) because the average predictions of the different models are calculated. The weighted averaging can also be likened to the weighted voting method, and somewhat likened to the simple averaging method. Regardless, in this case, before the average is calculated, the production of the individual model is multiplied by its weight.

  • Bagging:

You remember when bootstrapping (bootstrap aggregation) was mentioned as the perfect option in the problem of solving inadequate data, and I promised we were going to get there… Well, here we are!

Bootstrapping is a concept used in bagging to create various samples. Bagging is the creation of different models with the same datasets, using the parallel technique. After bootstrapping is completed, either based on the use of classifying or regression models, a voting or an average technique is introduced to produce more accurate results.

Photo from Wikipedia

An example of a model that uses the bagging technique is Random Forest. Random forest uses decision trees as it’s base model, it also uses random feature selection in addition.

The base model of ‘Random forest’ is understandable. Many trees make a forest, you know. An example of application of the Random forest model in the health care sector is in the diagnosis of Diabetes in patients, based on the risk factors manifesting in the patients, in relation to the existing factors.

A major disadvantage of decision trees algorithm is its liability to overfit, due to high variance. However, with the advantage of bagging, which reduces variance, Random forests do not overfit.

  • Boosting:

Unlike bagging which uses a parallel method, boosting uses a sequential method.

Photo from buggy programmer

Yet, similarly to bagging, boosting also employs two phases by using random sampling, then, majority voting, although boosting uses weighted voting because models are weighted by their performance.

A condition where the result of the prediction of a model is incorrect, the next model, incorrect also, the third, and continuously. It will be inappropriate and inaccurate to combine such a model or even calculate its average. Here is where Boosting comes in, the succeeding model is dependent on the preceding model, not only is the succeeding model dependent. It likewise aims to correct the errors in the previous model. This is why if I were to liken this to humans, I’ll give an analogy of coming back to life. If you have the opportunity to come back to life, you would get to make better choices for yourself because you have lived before, you might try to avoid some mistakes and try to be perfect. And if you make any more mistakes, you get to start again learning from your former life till you are perfect. That is exactly how the boosting assembling method works. (Talking about humans, some people will still make mistakes nevertheless. Luckily, models can only make errors. By the way, I wish there was a boosting technique for humans.)

While bagging reduces variance, boosting reduces bias, as this increases the accuracy of the model’s prediction. As a result of this, the boosting ensembling technique is described as, “an algorithm that converts ‘weak learner’ to a ‘strong learner’.” Adaboost (Adaptative boosting) and Gradient boosting are examples of algorithms that use the boosting technique.

  • Stacking:
https://i.ytimg.com/vi/DCrcoh7cMHU/maxresdefault.jpg

While bagging and boosting evaluate homogenous weak learners, stacking considers heterogeneous weak learners. The stacking technique uses a technique known as meta-learning. It combines predictions from different individual base models with the same dataset, to produce a meta-model. This meta-model, which is the new model, is used to make predictions on the test data set. For example, a training data set runs through ‘decision trees’ as a base model, we run this same data set through another base model, MLP, SVM next, and maybe more. Stacking has nothing to do with calculating the average or majority votes of the results of the predictions. Instead, the different predictions that result from running this training data set are used to build a new model, which we refer to as the “meta-model”.

This meta-model then inputs this data and runs a testing data set to get a final prediction, depending on the problem. I hope you got that?

Now, let’s conclude this!

Generally, ensemble methods can be used for supervised learning. Nevertheless, it can also be used for unsupervised learning similarly.

Creating ensembles of diverse models is a very important factor in achieving accuracy.

Therefore, when you are trying to get online reviews about movies or seeking your friends’ opinions about a product you are willing to invest your money in. It’s okay if you see yourself as a dataset going through the ensembling technique in a model way.

Thanks for reading.

https://actorgenie.com/wp-content/uploads/2017/01/bow.jpg

--

--