Ensemble Technique

Ujjawal Verma
Analytics Vidhya
Published in
4 min readDec 7, 2020

Ensemble basically means create multiple models and then combine them. We trying to utilize multiple models(trained a particular dataset) and finally we will get the output. Actually the principle behind the ensemble model is to combine the weak learners together and form a strong learner that will increasing the accuracy of the model.

There are two type of Ensemble techniques.

  1. Bagging (or Bootstrap aggregation)
  2. Boosting

In this article I will try to explain the Bagging technique in a very easy way. And the boosting technique will be cover in the next articles.

let’s begin the learning about Bagging technique.

Bagging Technique

Let’s take an example to understand the bagging technique. For a particular problem statement, we have a dataset (D) and it having many rows.

Bagging Technique

As this is ensemble technique so we will combining multiple models, As the above figure, there are multiple base models M1, M2, M3,….Mk, and each an every model will just provide a sample of dataset. In the figure D1m, D2m, D3m,….. Dkm are the different dataset with same length (m). and this sampling of the data called Row Sampling With Replacement.

As we know D1m, D2m, D3m,….. Dkm are the different sample of datasets that that are provided to M1,M2,M3,…Mk base learner models respectively. Hence suppose the base learner models are binary classifiers, so every model will train as per the provided sample dataset.

Now let us test our test data Dtest and every model M1, M2, M3,….Mk will give some result for each records (that will be 0 or 1 for binary classifier), Now her we will use Voting classified and base on the voting, the majority of vote given by the models will be consider. for our example the majority vote is 1, hence the output will be 1 for that particular test dataset Dtest.

The step when we using row sampling with replacement is called Bootstrap.

and when we using voting classifier to consider the result on the basis of majority is called aggregation.

Random Forest

Random forest is one of the bagging technique, we will not go into deep with random forest but try to understand the structure of this algorithm. let’s look at the below figure.

Random Forest

RS → Row Sampling with Replacement

FS → Feature Sampling With Replacement

d → Row in dataset D

m →Featured in dataset D

Here D1, D2, D3,….Dk are the sample dataset. length of dataset D is greater than D1, D2, D3,…. Dk.

length of D > length of D1,D2,D3,…Dk

d’ → Row in RS

m’ → Featured in FS

In random forest all base learner models will be Decision Tree models. All models will train with the sample dataset and each and every DT model will give you some result (0 or 1 for binary classifier) and we will chose one result by using the voting classifier.

Now need to understand one more thing here, what happened when we are using so many decision tree in a particular random forest?

  • suppose we are creating decision tree to it’s complete depth so it has:

Low Bias : means it properly train to the training dataset, so training error will be very very less.

High Variance : Means whenever we get our new test data, those decision tree prone to give large amount of error.

In short whenever we creating a decision tree to it’s complete depth, it leads to something called overfitting.

So what is happening in Random forest, we are using multiple decision tree and we know that each and every DT having high variance. But when we combining all decision tree with respect to the majority vote, then the high variance will get convert into Low Variance. means the error will be less.

Basically when we use RS & FS and giving different different datasets to base models and all base model will train and DT tends to become on expert with respect to the specific sample dataset (RS,FS), and to convert high variance to low variance.

We are basically trying the majority vote. we are not only dependent on a decision tree output. So this is the reason when we combining all the DT, high variance converted into low variance.

  • There is one more advantage to combining the DT need to understand.

If we change the test dataset (eg.- 100000 records to 500 records) it will not make that much impact to the DT with respect to the accuracy or with respect to output. Means whenever we change our test data, we will be getting low variance or low error.

So, those are the most important property of Random Forest. Random forest actually work very well with respect to most of the Machine Learning use cases that we are trying to do.

This is about the binary classifier, now suppose we have regression problem, then all the output of DT1, DT2, DT3,….DTk are continuous values, so we will take the mean/median/mode of that particular output.

This is the about the Bagging technique. Hope you got something from here, In the upcoming article we will explore boosting techniques like ADABoost, Gradient boosting and XG Boost. stay tune, I’ll be back soon with Boosting technique.

Thank you!!!

--

--