Random Forest Starter Notebook!
A starter code you can modify for your Random Forest use case
Random forest is a supervised learning algorithm which is used for both classification as well as regression. There are many different models available to make predictions on classification data. Logistic regression is one of the most common for binomial data. Other methodologies include support vector machines (“SVMs”), naive Bayes, and k-nearest neighbors. Random forests tend to shine in scenarios where a model has a large number of features that individually have weak predicative power but much stronger power collectively.
In this article, I’ll give you a quick guide on how to implement a random forest model in Python for classification problems.
Import libraries:
# Imports
# pandas
import pandas as pd
from pandas import Series,DataFrame
# numpy, matplotlib, seaborn
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline
# machine learning
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC, LinearSVC
from sklearn.ensemble import RandomForestClassifier,GradientBoostingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
Import the data:
# get titanic & test csv files as a DataFrame
titanic_df = pd.read_csv("../input/train.csv")
test_df = pd.read_csv("../input/test.csv")
Drop Unnecessary data:
# drop unnecessary columns, these columns won't be useful in analysis and prediction
titanic_df = titanic_df.drop(['PassengerId','Name','Ticket'], axis=1)
test_df = test_df.drop(['Name','Ticket'], axis=1)
Define training and test data splits:
# define training and testing sets
X_train = titanic_df.drop("Survived",axis=1)
Y_train = titanic_df["Survived"]
X_test = test_df.drop("PassengerId",axis=1).copy()
Train the model:
# Random Forests
random_forest = RandomForestClassifier(n_estimators=100,oob_score=True,max_features=5)
random_forest.fit(X_train, Y_train)
Y_pred = random_forest.predict(X_test)
random_forest.score(X_train, Y_train)OP : 0.9640852974186308
This was a simple tutorial for getting started on Random forests!
Do reach out and comment if you get stuck!
Other articles that might be interested in:
- Pandas 10 minute guide. This will serve as a basic guide to get… | by Sam | Geek Culture | Jan, 2022 | Medium
- Beautiful plots with Seaborn. Create Plots that get your… | by Sam | Geek Culture | Jan, 2022 | Medium
- Your go to Numpy checklist. A quick glance at all the important… | by Sam | Geek Culture | Jan, 2022 | Medium
- Your go to Numpy checklist. A quick glance at all the important… | by Sam | Geek Culture | Jan, 2022 | Medium
- Getting started with Apache Spark — I | by Sam | Geek Culture | Jan, 2022 | Medium
Cheers and do follow for more such content! :)