Ensemble Methods: A combination of Machine Learning Models to improve Accuracy

Zarish Anwar
3 min readSep 7, 2022

--

Ensemble method why a combination of machine learning Models? Ensemble methods are famous machine learning techniques that work by combining several base learners (models) to produce an optimal model having an improved accuracy. They are also known as meta-algorithms.

Let us understand the concept from an example: if you want to purchase a car, it is highly unlikely you will conclude your buying decision in an instant. Most likely you will evaluate your discussion through several methods. You will browse through different web portals to look for car models, features, their prices, and reviews. Most probably you will ask for opinions from your friends or colleagues. Or you will walk to the car shop and takes advice from a car dealer. This means your decision uses opinions (combination of knowledge) from many other people. The same is the case with ensemble methods. They utilize multiple models/opinions and then combine them for improved performance.

Why they are famous: As we know machine learning is positively impacting business efficiency, these meta-algorithms generally produce more accurate solutions than a single model depending on the dataset consistency. There exists a number of winning solutions in machine learning competitions that have used ensemble methods for winning such as collaborative filtering algorithms in the Netflix competition or KDD 2009.

When we should consider the ensemble

The main hypothesis of the ensemble is that; combining weak models or learners to obtain a more robust, precise, and reliable model. In this way, it will help to prevent overfitting and underfitting, enhances randomization, and improves model accuracy.

Figure 1. Need for Ensemble Methods

Bias-Variance Trade-off

The most important thing to achieve good results in machine learning problems like classification or regression is the selection of a model. This selection depends on many factors like distributive hypothesis, quality of dataset and dimensionality of space… etc.

The two characteristics that are frequently used to explain the errors made by ML algorithms are Bias and Variance.

The bias serves as a gauge of how accurately the model can represent the function that maps the inputs to the outputs. It is used to capture the rigidity of model/learners which is the strength of the model’s presumption on the functional structure of the input-output mapping. Whereas the variance of the models is the degree to which model performance varies when it is fitted to various training datasets. It captures the details that the model has on the dataset. Every model’s performance is connected by both bias and variance.

Ideally, a model having low bias and low variance is preferred. For a model to “solve” a problem, it should have the freedom to deal with the inherent complexities of the dataset, and also it should be robust enough to prevent excessive variance to make the model more resilient. This is known as the tradeoff between bias and variance.

Figure 2. Illustration of Bias-Variance Trade-off (source)

Weak learners (or base models) are most commonly used for ensemble learning as they are combined to create more complicated models. These fundamental weak models typically don’t perform well on their own either because they have a strong bias (limited freedom models) or because they contain too much variance to be robust (more freedom models). To generate a strong ensemble learner that performs well, the approach aims to attempt and reduce the bias and/or variance of such weak learners by merging a number of them.

One important factor is the selection of weak models that should be consistent with the way aggregation is performed using these models. If we choose a model with high variance and low bias, the aggregating strategy should tend to minimize variance, and similarly, if we choose a model with high bias and low variance, the aggregating strategy should tend to minimize bias. There are many popular ensemble methods, but the next blog will discuss the two most important models in detail i.e. bagging and boosting.

--

--