Random Forests

Dr. Roi Yehoshua
9 min readMar 25, 2023

Random forests is a powerful machine learning model based on an ensemble of decision trees, where each tree is grown using a random subset of the data set. The final prediction of the model is based on a majority voting (for classification) or averaging (for regression) of the predictions of the trees in the forest.

Averaging the predictions of multiple decision trees reduces the variance of the model at the expense of a small increase in the bias. This generally boosts the performance of the final model greatly.

The random forest model

Before you read this article, I recommend that you read my previous articles on decision trees (part 1 and part 2), and on ensemble methods.

Now let’s dive in :)

Background: Bagging Methods

The training of random forests is based on the general technique of boostrap aggregating (or bagging) applied to decision trees.

Given a training set of n samples {(xᵢ, yᵢ)}, i = 1,…,n, bagging repeatedly selects a random sample with replacement of the training set and fits the same base model (a decision tree in our case) to these samples.

The number of trees in the ensemble and the number of samples used to train each one are hyperparameters of the model. Typically, a few hundred to several thousand trees are used, depending on the size and nature of…

--

--

Dr. Roi Yehoshua

Teaching Professor for Data Science and ML at Northeastern University | Top Writer in AI | 200K+ Views on Medium | https://www.linkedin.com/in/roi-yehoshua/