Random Forest Starter Notebook!

Udbhav Pangotra
Geek Culture
Published in
2 min readJan 30, 2022

A starter code you can modify for your Random Forest use case

Photo by Bench Accounting on Unsplash

Random forest is a supervised learning algorithm which is used for both classification as well as regression. There are many different models available to make predictions on classification data. Logistic regression is one of the most common for binomial data. Other methodologies include support vector machines (“SVMs”), naive Bayes, and k-nearest neighbors. Random forests tend to shine in scenarios where a model has a large number of features that individually have weak predicative power but much stronger power collectively.

In this article, I’ll give you a quick guide on how to implement a random forest model in Python for classification problems.

Import libraries:

# Imports

# pandas
import pandas as pd
from pandas import Series,DataFrame

# numpy, matplotlib, seaborn
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline

# machine learning
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC, LinearSVC
from sklearn.ensemble import RandomForestClassifier,GradientBoostingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB

Import the data:

# get titanic & test csv files as a DataFrame
titanic_df = pd.read_csv("../input/train.csv")
test_df = pd.read_csv("../input/test.csv")

Drop Unnecessary data:

# drop unnecessary columns, these columns won't be useful in analysis and prediction
titanic_df = titanic_df.drop(['PassengerId','Name','Ticket'], axis=1)
test_df = test_df.drop(['Name','Ticket'], axis=1)

Define training and test data splits:

# define training and testing sets

X_train = titanic_df.drop("Survived",axis=1)
Y_train = titanic_df["Survived"]
X_test = test_df.drop("PassengerId",axis=1).copy()

Train the model:

# Random Forests

random_forest = RandomForestClassifier(n_estimators=100,oob_score=True,max_features=5)
random_forest.fit(X_train, Y_train)
Y_pred = random_forest.predict(X_test)
random_forest.score(X_train, Y_train)
OP : 0.9640852974186308

This was a simple tutorial for getting started on Random forests!

Do reach out and comment if you get stuck!

Other articles that might be interested in:

Cheers and do follow for more such content! :)

--

--