Ensemble methods in Machine Learning

Santhosh Loganathan — Sat, 02 Mar 2019 20:38:51 GMT

Ensemble methods?

Ensemble methods is a machine learning technique that combines several base models in order to produce one optimal predictive model. To better understand how ensemble methods work, let’s consider moving with Decision Trees as an example.

Decision trees:

A Decision Tree determines the predictive value based on a series of questions and conditions. Let us consider the below Decision Tree for determining whether an individual should play outside or not.

The Tree on the left considers several weather factors into account, given each factor either makes a decision or asks another question. In this example we can clearly see that every time it is overcast, we will play outside. However, if it is raining, we must ask again if it is windy or not? If windy, we will not play. But given no wind, we are going to play outside for sure.

Whenever making decisions with Decision Trees we must take into consideration several factors before even making the decision. The most important among all of them is on what features we are going to split the tree on. In the above-mentioned Decision Tree instead of asking all those questions related to weather, if we had to ask whether we have friends to play with today in the first place. We would have gone to play if we have friends with us else we have asked questions related to weather and arrived with the entirely different Decision tree.

This is where Ensemble Methods come in handy! Rather than just relying on one Decision Tree and hoping we made the right decision at each split, Ensemble Methods allow us to take a sample of Decision Trees into account, calculate which features to use or questions to ask at each split, and make a final predictor based on the aggregated results of the sampled Decision Trees. The below image will give a clear picture of what the ensemble method does.

we know what an Ensemble method is. Now let us explore the different types of ensemble methods.

Types of Ensemble methods:

Bagging
Boosting

Bagging:

Bagging gets its name because it combines Bootstrapping and Aggregation to form one ensemble model. Given a sample of data, multiple bootstrapped subsamples are pulled. A Decision Tree is formed on each of the bootstrapped subsamples. After each subsample Decision Tree has been formed, an algorithm is used to aggregate over the Decision Trees to form the most efficient predictor.

A Picture is worth thousand words

Going by the above quote. Let us look into the below picture to understand bagging in a much better way.

Random forest:

Random Forest is similar to bagging but with a minor tweak to it. In a Bagged Decision tree, even though we have different bootstrapped samples from the larger data set as shown in above picture, the decision tree that is built from those bootstrapped samples will split using the same features which in turn result in the similar prediction result from all the decision trees. This makes the Ensemble model not performing effectively as desired. On the other hand, Random Forest models decide where to split based on a random selection of features. Rather than splitting at similar features at each node throughout, Random Forest models implement a level of differentiation because each tree will split based on different features. Thus over aggregation, the class that has more occurrence will be considered as a final result, thus resulting in a more accurate predictor than a single decision tree. Let’s have a look at the below picture to understand better.

Boosting:

Boosting refers to a family of algorithms that are able to convert weak learners to strong learners. The main principle of boosting is to fit a sequence of weak learners− models that are only slightly better than random guessings, such as small decision trees− to weighted versions of the data. More weight is given to examples that were misclassified by earlier rounds. The predictions are then combined through a weighted majority vote (classification) or a weighted sum (regression) to produce the final prediction. The principal difference between boosting and bagging, is that base learners are trained in sequence on a weighted version of the data.

Types of boosting:

The most common types of boosting are mentioned below,

Adaptive Boosting
Gradient Tree Boosting

Summary

The end result of any machine model is to arrive at a single model that provides the best solution to our problem statement. Rather than making one model and hoping this model is the most accurate predictor we can make, ensemble methods take an array of models into account and average the results of those models to produce one final model which will be the most accurate model.

Note:

Decision trees are not the only way to implement Ensemble methods. Decision trees have been chosen for simplicity and better understanding.
Pictures that are used in the article have been taken from Google to explain the concepts in a better way.

Stories by Santhosh Loganathan on Medium

Ensemble methods in Machine Learning