Forest Trails β€” Random Forest Algorithm from Scratch.

_> Random Forest

Hey guys! Today we will be talking about the extension of the decision tree algorithm β€” The Random Forest algorithm. One of the favorite algorithms in the ML community. This implementation is built on top of the Decision Tree classifier we built in the last blog.

Photo by charlesdeluvio on Unsplash

Random forest, a popular machine learning algorithm that merges the outputs of numerous decision trees to produce a single outcome. It is versatile and can perform on both classification and regression tasks! The algorithm’s strength lies in its ability to handle complex datasets and mitigate overfitting.

Before we dive into depths of how this algorithm works, I’d suggest you to go through this article which explains what ensemble method in- https://neptune.ai/blog/ensemble-learning-guide.

Now that we’re up to speed, let’s look at the algorithm β€” In the Random forest model, a subset of data points and a subset of features is selected for constructing each decision tree. Simply put, n random records and m features are taken from the data set having k number of records. Then individual decision trees are constructed for each sample. Each decision tree will generate an output. Next the Final output is considered based on Majority Voting or Averaging for Classification and regression, respectively.

Some advantages are β€” Diverse, Immune to curse of dimensionality, parallelization, Stability, and more!

So let’s jump right into it!

  • Initialization: We set some basic ground work for the algorithm to be built on.
from Classification.Decision_tree import DecisionTreeClassifier
import numpy as np

class RandomForestClassifier():
def __init__(self, n_estimators=10, feature_proportion=0.66):
self.n_estimators = n_estimators
self.predictors = []
self.feature_proportion = feature_proportion

n_estimators β€” Number of decision trees in the forest (default is 10)

feature_proportion β€” Proportion of features to consider when looking for the best split.

predictors β€” An empty list that will later hold all the decision tree classifiers.

  • Fit method: Preparing the battle field!
def fit(self, x_train, y_train):
for i in range(self.n_estimators):
dt = DecisionTreeClassifier(self.feature_proportion)
sample = np.random.choice(len(x_train), len(x_train), replace=True)
x = x_train[sample]
y = y_train[sample]

dt.fit(x, y)
self.predictors.append(dt)

This works on the basis of a simple loop that iterates through the number of trees needed. For each iteration the loop will create a Decision Tree, then choose a bootstrap sample from the training data (random sampling with replacement). We then fit the tree to this specified data and move on to store the classifier in our predictors list for later use.

  • Predict method: Let the fight begin!
def predict(self, x_test):
labels = []
for predictor in self.predictors:
labels.append(predictor.predict(x_test))

final_predictions = []
for index in range(len(labels[0])):
prediction_dict = {}
for results in labels:
try:
prediction_dict[results[index]] += 1
except KeyError:
prediction_dict[results[index]] = 1

final_predictions.append(max(prediction_dict))
return final_predictions

This method initializes an empty list labels to store predictions from each decision tree, for each decision tree in predictors, it predicts the labels for x_test and appends these predictions to labels.

Then it initializes yet another list final_predictions to store the aggregated predictions. For each test instance, it counts the frequency of each predicted label using a dictionary predict_dict. Appends the label with the highest frequency (the majority vote) to final_predictions. Finally returning the prediction.

This approach leverages the wisdom of the crowd, where multiple models work together to improve overall performance and robustness.

That was it guys for this implementation, hope you understood stuff and will implement this algorithm on your own too!

Happy Coding! See ya next time!

--

--