Random Forest in Machine Learning

Kaviya Arunagiri
2 min read4 days ago

Hey everyone, today we will focus on Random Forest. Let’s get started:

Before getting into it, the Random Forest is based on ensemble learning, which we will discuss later in the upcoming article. And also get to about the decision tree from here, if you don’t know about it:

Decision Tree in Machine Learning | by Kaviya Arunagiri | Jun, 2024 | Medium

What is Random Forest?

Random Forest is an ensemble learning method that combines multiple decision trees during training. Each tree is constructed using a random subset of the dataset and a random subset of features. This randomness helps reduce overfitting and improves overall prediction performance. In prediction, the algorithm aggregates the results of all trees (either by voting for classification or averaging for regression) to provide stable and precise results.

Random Forest

Random Forest Models can be considered BAGGING, with a slight tweak. Bagged Decision Trees have the full disposal of features to choose from when deciding where to split and how to make decisions. Therefore, although the bootstrapped samples may be slightly different, the data is largely going to break off at the same features throughout each model.

On the contrary, Random Forest Models decide where to split based on the random selection of features. Rather than splitting similar features at each node throughout, Random Forest nodes implement a level of differentiation because each tree will split based on different features. This level of differentiation provides a greater ensemble to aggregate over, ergo producing a more accurate predictor.

Random Forest involves building many decision trees on a bootstrapped training set.

Implementation:

import pandas as pd
from sklearn.datasets import load_digits
digits=load_digits()
dir(digits)

%matplotlib inline
import matplotlib.pyplot as plt
plt.gray()
for i in range(5):
plt.matshow(digits.images[i])


digits.data[:5]
df=pd.DataFrame(digits.data)
df.head()
digits['target']=digits.target
df.head()

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(df.drop(['target'],axis='columns', errors='ignore'),digits.target,test_size=0.2)
len(X_test)

from sklearn.ensemble import RandomForestClassifier
model=RandomForestClassifier()
model.fit(X_train,y_train)
model.score(X_test,y_test)
y_predicted=model.predict(X_test)

from sklearn.metrics import confusion_matrix
cm=confusion_matrix(y_test,y_predicted)
cm
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sn
plt.figure(figsize=(10,7))
sn.heatmap(cm,annot=True)
plt.xlabel('Predicted')
plt.ylabel('Truth')

Here you can access the full code:

Random_Forest/Random_Forest.ipynb at main · kaviya2478/Random_Forest (github.com)

I know it is bit difficult to understand. But with upcoming articles, you will understand this concept better.

Thank you. Go have some water :)

--

--