Linear Regression on Ecommerce Customer Dataset
Project Overview:
You got some contract work with E commerce company in New York City that sells clothing online
The company is trying to decide whether to focus on their efforts on their mobile app or their website
Lets begin — — — — — — — — ->>>>>>>>>>>>>
Imports packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
Get the Data:
Read the ecommerce data from csv file
customers=pd.read_csv(“Ecommerce Customers.csv”)
customers.head()
Data Analysis:
customers.describe()
customers.info()
Exploratory Data Analysis:
Lets explore the data to find the relationship between the features
# comparing time on Website and Yearly Amount Spent
sns.jointplot(data=customers,x=’Time on Website’,y=’Yearly Amount Spent’)
# comparing time on App and Yearly Amount Spent
sns.jointplot(data=customers,x=’Time on App’,y=’Yearly Amount Spent’)
# Comparing the co-relation between entire features in E-commerce data
sns.pairplot(customers)
Training and Testing the Data
Now we have explored the data a bit,lets go ahead and split the data into training and testing sets.
Y= ‘Yearly Amount Spent’ is a dependent variables
x=‘Avg. Session Length’, ‘Time on App’, ‘Time on Website’, ‘Length of Membership’ are independent variable
=====================================
y=customers[‘Yearly Amount Spent’]
x=customers[[‘Avg. Session Length’, ‘Time on App’, ‘Time on Website’, ‘Length of Membership’]]
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.33, random_state=42)
==================================
Training the model:
Now its time to train our model on our training data.
from sklearn.linear_model import LinearRegression
lm=LinearRegression()
lm.fit(x_train,y_train)
======================================
Print out the Coefficients in the model
lm.coef_
array([25.70676165, 38.57260842, 0.62520092, 61.71767604])
======================================
Predicting the test Data
Now its time to predict the model and lets evaluate its performance
Use lm.predict() to predict on the x_test dataset
predictions=lm.predict(x_test)
Evaluating the model:
Lets evaluate the model performance by calculating the residual sum of squares and explained variance score (r**2)
from sklearn import metricsprint(‘MAE’,metrics.mean_absolute_error(y_test,predictions))
print(‘MSE’,metrics.mean_squared_error(y_test,predictions))
print(‘RMSE’,np.sqrt(metrics.mean_squared_error(y_test,predictions)))
MAE 8.35357352501757
MSE 102.4042865993193
RMSE 10.119500313717042
metrics.explained_variance_score(y_test,predictions)
0.9814710935431786
Residuals:
Lets explore the residuals to make sure that everything was okay
Recreate the Data frame for coefficients:
cdf=pd.DataFrame(lm.coef_,x.columns,columns=[‘Coeff’])
cdf
Conclusion:
1 unit increase in avg Session length is associate with $26 more spent
1 unit increase time on App is associate with $38.5 more spent
1 unit increase time on Website is associate with $0.6 more spent
1 unit increase length of membership is associate with $61 more spent
Hence company should focus on website development to catch up Mobile app as mobile app is already doing good.