Random Forest Regression in 5 Steps with Python

Samet Girgin
3 min readApr 7, 2019

--

Ensembling Learning: Boosting and ensembling learning mean when you take multiple algorithms or the same algorithm multiple times and you put them together to make something much more powerful than the original.

Random forest is just a team of decision trees. The final prediction of the random forest is simply the average of the different predictions of all the different decision trees.

Here is the 4-step way of the Random Forest

#1 Importing the librariesimport numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#2 Importing the datasetdataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values
# Splitting the dataset into the Training set and Test set#3 Fitting the Random Forest Regression Model to the dataset
# Create RF regressor here
from sklearn.ensemble import RandomForestRegressor #Put 10 for the n_estimators argument. n_estimators mean the number #of trees in the forest.regressor = RandomForestRegressor(n_estimators=10, random_state=0)
regressor.fit(X,y)
#4 Visualising the Regression results (for higher resolution and #smoother curve)X_grid = np.arange(min(X), max(X), 0.01)
X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(X, y, color = 'red')
plt.plot(X_grid, regressor.predict(X_grid), color = 'blue')
plt.title('Check It (Random Forest Regression Model)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

There are more steps in this result if we compare it with the Decision Tree Regression.

#5 Predicting a new resulty_pred = regressor.predict(5.5)

Output: y_pred: 108000

  • Let’s increase the trees in the forest to 100 and check it out.
# Fitting the Random Forest Regression Model to the dataset
# Create RF regressor here
from sklearn.ensemble import RandomForestRegressor #Put 10 for the n_estimators argument. n_estimators mean the number #of trees in the forest.regressor = RandomForestRegressor(n_estimators=100, random_state=0)
regressor.fit(X,y)
  • Look at the plot for the model with 100 trees.
#Visualising the Regression results (for higher resolution and #smoother curve)X_grid = np.arange(min(X), max(X), 0.01)
X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(X, y, color = 'red')
plt.plot(X_grid, regressor.predict(X_grid), color = 'blue')
plt.title('Check It (Random Forest Regression Model)w/100 trees')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

The steps of the graph don’t increase 10 times as the number of trees in the forest. But the prediction will be better. Let’s predict the result of the same variable.

# Predicting a new result
y_pred = regressor.predict(5.5)

Output: y_pred(5.5) = 121800 (It is a better prediction)

Let’s increase the trees in the forest to 300 and check it out.

#3 Fitting the Random Forest Regression Model to the dataset
# Create RF regressor here
from sklearn.ensemble import RandomForestRegressor#Put 10 for the n_estimators argument. n_estimators mean the number #of trees in the forest.regressor = RandomForestRegressor(n_estimators=300, random_state=0)
regressor.fit(X,y)
#4 Predicting a new result
y_pred = regressor.predict(5.5)

Output: y_pred (5.5)= 120233.33 (That’s a great prediction and near to the real value)

References:

--

--