Ensemble Methods for Decision Trees

Valentina Alto
Analytics Vidhya
5 min readDec 31, 2019

--

Decision Trees are popular Machine Learning algorithms used for both regression and classification tasks. Their popularity mainly arises from their interpretability and representability, as they mimic the way the human brain takes decisions.

However, to be interpretable, they pay a price in terms of prediction accuracy. To overcome this caveat, some techniques have been developed, with the goal of creating strong and robust models starting from ‘poor’ models. Those techniques are known as ‘ensemble’ methods and, in this article, I’m going to talk about three of them: Bagging, Random Forest and Boosting.

Bagging

The idea of bagging is that, if we were able to train on different datasets multiple trees and then use an average (or, in case of classification, the majority vote) of their output to predict the label of a new observation, we would get more accurate results. Namely, imagine we train on four different datasets, drawn from the same population, 4 decision trees which are meant to classify an e-mail as spam or not spam. Then a new e-mail arrives and three of them classify it as spam, while one of them as not spam.

--

--

Valentina Alto
Analytics Vidhya

Data&AI Specialist at @Microsoft | MSc in Data Science | AI, Machine Learning and Running enthusiast