Random Forests

9 min readMar 25, 2023

Random forests is a powerful machine learning model based on an ensemble of decision trees, where each tree is grown using a random subset of the data set. The final prediction of the model is based on a majority voting (for classification) or averaging (for regression) of the predictions of the trees in the forest.

Averaging the predictions of multiple decision trees reduces the variance of the model at the expense of a small increase in the bias. This generally boosts the performance of the final model greatly.

Before you read this article, I recommend that you read my previous articles on decision trees (part 1 and part 2), and on ensemble methods.

Now let’s dive in :)

Background: Bagging Methods

The training of random forests is based on the general technique of boostrap aggregating (or bagging) applied to decision trees.

Given a training set of n samples {(xᵢ, yᵢ)}, i = 1,…,n, bagging repeatedly selects a random sample with replacement of the training set and fits the same base model (a decision tree in our case) to these samples.

The number of trees in the ensemble and the number of samples used to train each one are hyperparameters of the model. Typically, a few hundred to several thousand trees are used, depending on the size and nature of…

Random Forests

Background: Bagging Methods

Written by Dr. Roi Yehoshua