What is Random Forest Classification?

Honey Saini
2 min readMay 5, 2023

--

Random Forest is an ensemble learning technique combining multiple decision trees to improve the model’s accuracy. It’s called “Random” because each tree in the forest is trained on a random subset of the training data and a random subset of the features. This helps to reduce overfitting and improve the generalization of the model.

In Random Forest, the output of each tree is combined to make the final prediction. The majority vote of all the trees is taken to make the final prediction. This approach is called bagging or bootstrap aggregating.

Real-world Example

Let’s consider a real-world example to understand how Random Forest Classification works. We will use the famous Iris dataset, which consists of 150 samples of iris flowers. Each sample has four features: sepal length, sepal width, petal length, and petal width. The task is to classify the flowers into one of the three species: setosa, versicolor, or virginica.

Implementing Random Forest Classification in Python

We will use scikit-learn, a popular machine learning library, to implement Random Forest Classification in Python. Let’s start by importing the required libraries and loading the dataset.

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Next, we will create an instance of the Random Forest Classifier class and train the model on the training data.

# Create a Random Forest Classifier with 100 trees
rfc = RandomForestClassifier(n_estimators=100, random_state=42)
# Train the model on the training data
rfc.fit(X_train, y_train)

Once the model is trained, we can use it to make predictions on the testing data.

# Make predictions on the testing data
y_pred = rfc.predict(X_test)
# Calculate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

The output will show the accuracy of the model on the testing data.

Accuracy: 1.0

Conclusion

In this blog, we learned about Random Forest classification, an ensemble learning technique that combines multiple decision trees to improve the accuracy of the model. We also implemented Random Forest classification in Python using scikit-learn and applied it to a real-world example. We saw that Random Forest classification achieved a high accuracy of 1.0 on the Iris dataset.

In real-world problems accuracy as high as 1 is difficult. Accuracy in the range of 0.7–0.9 is also acceptable. When you work with a business, Your outlook for model accuracy and precision is flexible and greatly dependent on the business problem you have on hand.

Hope you enjoy this read. Have questions connect with me @Teams https://teams.live.com/l/invite/FEAhzQ4i1TegWdkkAI or email me at LearnMLDataScience@gmail.com

--

--

Honey Saini

I have 7 years of experience in Data Science. Currently working as Senior Data Engineer. Want to learn more? follow on https://learnmldatascience.blogspot.com