Understanding Ensemble Techniques!!!

Abhigyan
Analytics Vidhya
Published in
5 min readSep 27, 2020

Content:

  1. What are Ensemble Methods?
  2. Intuition Behind Ensemble Methods!
  3. Different Ensemble Methods
    * Bagging
    →Intuition behind Bagging
    * Boosting
    →Intuition behind Boosting
    * Stacking
    →Intuition behind Stacking
    * Bucket of models

What are Ensemble Methods?

  • Ensemble methods are techniques that create multiple models and then combine them to produce improved results.
  • This approach allows the production of better predictive performance compared to a single model.
  • Ensemble methods usually produce more accurate solutions than a single model would. This has been the case in many machine learning competitions, where the winning solutions used ensemble methods.

The intuition behind the ensemble:

Photo by bruce mars on Unsplash

→ Suppose you are a construction giant and have been given a project to build a skyscraper, you bring in an architect to build you a blueprint of the structure and you like the design, but you find there are a lot of flaws in it.

→ So you bring in a group of architects and they discuss among themselves and bring about the best architecture with little or no flaws.

Ensemble techniques are similar, multiple models are build to make the most accurate prediction.

Different Types of Ensemble Techniques

Ensemble techniques are of 3 types:

→ Bagging:

  • Bootstrap Aggregation(or Bagging) is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression.
  • It also reduces variance and helps to avoid overfitting. Although it is usually applied to decision tree methods, it can be used with any type of method.
  • Bagging is a special case of the model averaging approach.
  • Bootstrapping is a statistical procedure that resamples a single dataset to create many simulated samples.
    Samples are made by taking out observations from the data population one at a time and returning them to the data population after they have been chosen.
    This allows a given observation to be included in a given small sample more than once.
  • Aggregation is the process of combining things. That is, putting those things together so that we can refer to them collectively.
    In other words, the bootstrapped samples are modeled and the results are combined and the produced.

Intuition Behind Bagging:
Let’s continue with the intuition we produced earlier for ensemble techniques.

→ The group of Architects appointed was given a task to look at a design and come up with something that removes the flaws in the design.
→ These Architects now individually made a solution to the flaws and did a meeting where they presented the solutions and together the most appropriate solutions from each of the architects were used.

Code for Bagging:

from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier
dt = DesicionTreeClassifier()
bagging = BaggingClassifier(base_estimator = dt).fit(x,y)

→ Boosting:

Boosting refers to a family of algorithms which converts weak learner to strong learners.

  • Boosting is an ensemble method for improving the model predictions of any given learning algorithm.
  • The idea of boosting is to train weak learners sequentially, each trying to correct its predecessor.
  • This is done by building a model from the training data, then creating a second model that attempts to correct the errors from the first model. Models are added until the training set is predicted perfectly or a maximum number of models are added.
  • It reduces bias from the model.

Intuition Behind Boosting:
Continuing with the earlier intuitive approach to ensemble

→ One of the other ways that architects could have adopted to complete a flawless design is.

→ One of the many architects starts with the correction in design and when he/she is finished he passes his result to the other.

→ The other architect looks at the corrected design and makes necessary corrections again and passes to the other one.

→ This process will continue until the copy of the design passes through all the architects and reaches the last one.

→ The result will have a new copy of the design with the corrected flaws.

Code for Boosting:

from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
dt = DecisionTreeClassifier()boosting = AdaBoostClassifier(base_estimator=dt)

Bagging takes care of the work parallely and boosting does sequentially.

→ Stacking

  • Stacking(or Stacked Generalization) is an ensemble learning technique that combines multiple classifications or regression models with a meta-classifier or a meta-regressor.
  • The base-level models are trained based on a complete training set, then the meta-model is trained on the outputs of the base level model-like features.
  • The benefit of stacking is that it can harness the capabilities of a range of well-performing models on a classification or regression task and make predictions that have better performance than any single model in the ensemble.

The intuition behind Stacking:
Continuing with the example of a construction giant who wants a flawless design to do that multiple architects were brought.

  • Now, each architect made it’s own best design
  • After the design was made, those designs were sent to the head/Lead architect.
  • Every design was then looked by the head architect and combined to get one flawless design for the construction.

Code for Stacking:

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from mlxtend.classifier import StackingClassifierLog_model = LogisticRegression()
KNN_model = KNeighborsClassifer()
Rnd_model = RandomForestClassifier()
stack = StackingClassifier(classifier =
[KNN_model,Rnd_model],
meta_classifier = Log_model)

→ Bucket of Models

  • A “bucket of models” is an ensemble technique in which a model selection algorithm is used to choose the best model for each problem.
  • When tested with only one problem, a bucket of models can produce no better results than the best model in the set, but when evaluated across many problems, it will typically produce much better results, on average, than any model in the set.
  • Input:
    — Models you want to use.
    — A dataset.
  • Learning algorithm:
    — Use 10-CV to estimate the error of all the models
    — Pick the best (lowest 10-CV error) model.
    — Train the model on Dataset and return its result.

Happy Learning!!!

Like my article? Do give me a clap and share it, as that will boost my confidence. Also, I post new articles every Sunday so stay connected for future articles of the basics of data science and machine learning series.

Also, do connect with me on LinkedIn.

Photo by Frederick Tubiermont on Unsplash

--

--