Ensemble methods-Bagging & Boosting

7 min readMar 29, 2022

INTRODUCTION:

The bias-variance trade-off is a challenge we all face while training machine learning algorithms. Bagging is a powerful ensemble method which helps to reduce variance, and by extension, prevent overfitting. Ensemble methods improve model precision by using a group (or “ensemble”) of models which, when combined, outperform individual models when used separately.

In particular, we’ll cover:

An Overview of Ensemble Learning
Why Use Ensemble Learning?
Advantages of Ensemble Learning
What Is Bagging?
Bootstrapping
Applications of Bagging
The Bagging Algorithm
What Is Boosting?
Types of Boosting
Applications of Boosting
Boosting Algorithm
BAGGING vs BOOSTING

ENSEMBLE LEARNING:

Ensemble methods are techniques that aim at improving the accuracy of results in models by combining multiple models instead of using a single model. The combined models increase the accuracy of the results significantly. This has boosted the popularity of ensemble methods in machine learning.

WHY ENSEMBLE LEARNING:

There are two main reasons to use an ensemble over a single model, and they are related; they are:

Performance: An ensemble can make better predictions and achieve better performance than any single contributing model.
Robustness: An ensemble reduces the spread or dispersion of the predictions and model performance.

Ensembles are used to achieve better predictive performance on a predictive modeling problem than a single predictive model. The way this is achieved can be understood as the model reducing the variance component of the prediction error by adding bias (i.e. in the context of the bias-variance trade-off)

Advantages of Ensemble Learning:

Let’s understand the advantages of ensemble learning using a real-life scenario. Consider a case where you want to predict if an incoming email is genuine or spam. You want to predict the class (genuine/spam) by looking at several attributes individually: whether or not the sender is in your contact list, if the content of the message is linked to money extortion, if the language used is neat and understandable, etc. Also, on the other hand, you would like to predict the class of the email using all of these attributes collectively. As a no-brainer, the latter option works well because we consider all the attributes collectively whereas in the former one, we consider each attribute individually. The collectiveness ensures robustness and reliability of the result as it’s generated after a thorough research.

This puts forth the advantages of ensemble learning:

Ensures reliability of the predictions.
Ensures the stability/robustness of the model.

To avail of these benefits, Ensemble Learning is often the most preferred option in a majority of use cases. In the next section, let’s understand a popular Ensemble Learning technique, Bagging.

BAGGING:

What Is Bagging?

Bagging, also known as bootstrap aggregating, is the aggregation of multiple versions of a predicted model. Each model is trained individually, and combined using an averaging process. The primary focus of bagging is to achieve less variance than any model has individually. To understand bagging, let’s first understand the term bootstrapping.

Bootstrapping: Bootstrapping is the process of generating bootstrapped samples from the given dataset. The samples are formulated by randomly drawing the data points with replacement.

Applications of Bagging:

Bagging technique is used in a variety of applications. One main advantage is that it reduces the variance in prediction by generating additional data whilst applying different combinations and repetitions (replacements in the bootstrapped samples) in the training data. Below are some applications where the bagging algorithm is widely used:

Banking Sector
Medical Data Predictions
High-dimensional data
Land cover mapping
Fraud detection
Network Intrusion Detection Systems
Medical fields like neuroscience, prosthetics, etc.

BAGGING ALGORITHM:

Classifier generation:

Let N be the size of the training set.
for each of t iterations:
    sample N instances with replacement from the original training set.
    apply the learning algorithm to the sample.
    store the resulting classifier.

Classification:
for each of the t classifiers:
    predict class of instance using classifier.
return class that was predicted most often.

from sklearn import model_selection
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
import pandas as pd
# load the data
url = "/home/debomit/Downloads/wine_data.xlsx"
dataframe = pd.read_excel(url)
arr = dataframe.values
X = arr[:, 1:14]
Y = arr[:, 0]
seed = 8
kfold = model_selection.KFold(n_splits = 3,
random_state = seed)
# initialize the base classifier
base_cls = DecisionTreeClassifier()
# no. of base classifier
num_trees = 500
# bagging classifier
model = BaggingClassifier(base_estimator = base_cls,
n_estimators = num_trees,
random_state = seed)
results = model_selection.cross_val_score(model, X, Y, cv = kfold)
print("accuracy :")
print(results.mean())

accuracy :
0.8372093023255814

BOOSTING:

What is boosting?

Boosting is an ensemble learning method that combines a set of weak learners into a strong learner to minimize training errors. In boosting, a random sample of data is selected, fitted with a model and then trained sequentially — that is, each model tries to compensate for the weaknesses of its predecessor. With each iteration, the weak rules from each individual classifier are combined to form one, strong prediction rule.

Before we go further, let’s explore the category of ensemble learning more broadly, highlighting two of the most well-known methods: bagging and boosting.

Types of boosting:

Boosting algorithms can differ in how they create and aggregate weak learners during the sequential process. Three popular types of boosting methods include:

Adaptive boosting or AdaBoost: Yoav Freund and Robert Schapire are credited with the creation of the AdaBoost algorithm. This method operates iteratively, identifying misclassified data points and adjusting their weights to minimize the training error. The model continues optimize in a sequential fashion until it yields the strongest predictor.

Gradient boosting: Building on the work of Leo Breiman, Jerome H. Friedman developed gradient boosting, which works by sequentially adding predictors to an ensemble with each one correcting for the errors of its predecessor. However, instead of changing weights of data points like AdaBoost, the gradient boosting trains on the residual errors of the previous predictor. The name, gradient boosting, is used since it combines the gradient descent algorithm and boosting method.

Extreme gradient boosting or XGBoost: XGBoost is an implementation of gradient boosting that’s designed for computational speed and scale. XGBoost leverages multiple cores on the CPU, allowing for learning to occur in parallel during training.

APPLICATIONS OF BOOSTING:

Healthcare: Boosting is used to lower errors in medical data predictions, such as predicting cardiovascular risk factors and cancer patient survival rates. For example, research shows that ensemble methods significantly improve the accuracy in identifying patients who could benefit from preventive treatment of cardiovascular disease, while avoiding unnecessary treatment of others. Likewise, another study found that applying boosting to multiple genomics platforms can improve the prediction of cancer survival time.
IT: Gradient boosted regression trees are used in search engines for page rankings, while the Viola-Jones boosting algorithm is used for image retrieval. As noted by Cornell , boosted classifiers allow for the computations to be stopped sooner when it’s clear in which way a prediction is headed. This means that a search engine can stop the evaluation of lower ranked pages, while image scanners will only consider images that actually contains the desired object.
Finance: Boosting is used with deep learning models to automate critical tasks, including fraud detection, pricing analysis, and more. For example, boosting methods in credit card fraud detection and financial products pricing analysis improve the accuracy of analyzing massive data sets to minimize financial losses.

BAGGING ALGORITHM:

ADABOOST:

from sklearn.ensemble import AdaBoostClassifier #For Classification
from sklearn.ensemble import AdaBoostRegressor #For Regression
from sklearn.tree import DecisionTreeClassifier
dtree = DecisionTreeClassifier()
cl = AdaBoostClassifier(n_estimators=100, base_estimator=dtree,learning_rate=1)
cl.fit(xtrain,ytrain)
from sklearn.ensemble import AdaBoostClassifier #For Classification
from sklearn.ensemble import AdaBoostRegressor #For Regression
from sklearn.tree import DecisionTreeClassifier
dtree = DecisionTreeClassifier()
cl = AdaBoostClassifier(n_estimators=100, base_estimator=dtree,learning_rate=1)
cl.fit(xtrain,ytrain)
from sklearn.ensemble import AdaBoostClassifier #For Classification
from sklearn.ensemble import AdaBoostRegressor #For Regression
from sklearn.tree import DecisionTreeClassifier
dtree = DecisionTreeClassifier()
cl = AdaBoostClassifier(n_estimators=100, base_estimator=dtree,learning_rate=1)
cl.fit(xtrain,ytrain)
from sklearn.ensemble import AdaBoostClassifier #For Classification
from sklearn.ensemble import AdaBoostRegressor #For Regression
from sklearn.tree import DecisionTreeClassifier
dtree = DecisionTreeClassifier()
cl = AdaBoostClassifier(n_estimators=100, base_estimator=dtree,learning_rate=1)
cl.fit(xtrain,ytrain)

where:

n_estimators and learning_rate parameter serves the same purpose as in the case of Gradient Boosting algorithm,

base_estimator parameter helps to specify different machine learning algorithms.

GRADIENT BOOST:

from sklearn.ensemble import GradientBoostingClassifier #For Classification
from sklearn.ensemble import GradientBoostingRegressor #For Regression
cl = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1)
cl.fit(Xtrain, ytrain)

where:

n_estimators parameter is used to control the number of weak learners,

learning_rate parameter controls the contribution of all the vulnerable learners in the final output,

max_depth parameter is for the maximum depth of the individual regression estimators to limit the number of nodes in the tree.

XGBOOST:

import xgboost as xgb
xgb_model = xgb.XGBClassifier(learning_rate=0.001, max_depth=1, n_estimators_100)
xbg_model.fit(x_train, y_train)

BAGGING vs BOOSTING:

Bagging is a method of merging the same type of predictions. Boosting is a method of merging different types of predictions.

Bagging decreases variance, not bias, and solves over-fitting issues in a model. Boosting decreases bias, not variance.

In Bagging, each model receives an equal weight. In Boosting, models are weighed based on their performance.

Models are built independently in Bagging. New models are affected by a previously built model’s performance in Boosting.

SUMMING UP:

//references: https://www.ml-concepts.com/2-ensemble-learning-methods/