Linear Regression — Residual Plot Comparison for Ads data
Problem statements: In our last article, we learned how we can evaluate if Linear Regression is the right choice for our dataset. In this and upcoming articles, we will dive deeper into each feature and see if Linear Regression is a right choice for each.
If we recall from our last article, visually if the residual plot looks random and there is no pattern forming either a straight line or parabolic we can say Linear Regression would be a good choice. Let’s go ahead and calculate mean_absolute_error and root_mean_squared_error along with analyzing residual plot and distribution graph to evaluate model performance for TV and Newspaper separately. Let’s get going.
TV
Import our libraries and load our data
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
file_path = r'/Users/Downloads/advertising.csv'
df_ad_data = pd.read_csv(file_path)
df_ad_data.head()
Creating X matrix for TV data
X_tv = df_ad_data[["TV"]]
X_tv.head()
Get Y Vector Column
y_tv = df_ad_data['Sales']
y_tv.head()
Separate out our training data and test data
from sklearn.model_selection import train_test_split…