Random Forest

Navjot Singh
Analytics Vidhya
Published in
3 min readJun 16, 2020
Random Forest

Random Forest is a supervised learning algorithm. Like you can already see from it’s name, it creates a forest and makes it somehow random. The forest it builds, is an ensemble of Decision Trees, most of the time trained with the bagging method.

Ensemble Learning

In Ensemble learning, we take multiple algorithms or the same algorithm multiple times and we put them together or merge them together to make something more powerful than the original.

Bagging Technique

Bootstrapping the data and using the aggregate to make a decision is called Bagging. It improves the stability and accuracy of machine learning algorithms.The general idea of the bagging method is that a combination of learning models increases the overall result.

Bagging Technique

Random Forest is an Ensemble Learning technique.

Random forest builds multiple decision trees and merges them together to get a more accurate and stable prediction.

Let’s see how random forest works :

Step 1: Pick up random K data points or samples from the training set and create a ‘Bootstrapped’ dataset. To create a bootstrapped dataset that is the same size as the original,we just randomly select samples from the original dataset and we can also pick the same data or sample more than once.

Step 2 : Build the Decision tree associated to these K data point or samples. Now we create a decision tree using Bootstrapped dataset but only considering random subset of variables(or columns) at each step.

Step 3 : Choose the number of N trees you want to build and repeat steps 1 & 2 and it results in a wide variety of trees. The variety of trees makes the random forest more effective than individual decision tree.

Step 4 : For a new data point make each of the N trees predict the value of Y for the data point. After running the data to all of the tree in random forest we will se which option received more votes and that option is the final output or you can say the value of Y.

Why Use a Random Forest Algorithm?

Random Forest gives high level of accuary and more stable prediction. It also reduces the risk of overfitting and can run large database efficiently to produce highly accurate predictions.

That’s all for Random Forest Algorithm. Stay tuned for further blogs.

Thankyou

--

--

Navjot Singh
Analytics Vidhya

Machine learning enthusiast interested in making data actionable.