Linear Regression on Ecommerce Customer Dataset

Project Overview:

You got some contract work with E commerce company in New York City that sells clothing online

The company is trying to decide whether to focus on their efforts on their mobile app or their website

Lets begin — — — — — — — — ->>>>>>>>>>>>>

Imports packages

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Get the Data:

Read the ecommerce data from csv file

customers=pd.read_csv(“Ecommerce Customers.csv”)


Data Analysis:


Exploratory Data Analysis:

Lets explore the data to find the relationship between the features

# comparing time on Website and Yearly Amount Spent
sns.jointplot(data=customers,x=’Time on Website’,y=’Yearly Amount Spent’)

# comparing time on App and Yearly Amount Spent
sns.jointplot(data=customers,x=’Time on App’,y=’Yearly Amount Spent’)

# Comparing the co-relation between entire features in E-commerce data

Training and Testing the Data

Now we have explored the data a bit,lets go ahead and split the data into training and testing sets.

Y= ‘Yearly Amount Spent’ is a dependent variables

x=‘Avg. Session Length’, ‘Time on App’, ‘Time on Website’, ‘Length of Membership’ are independent variable


y=customers[‘Yearly Amount Spent’]

x=customers[[‘Avg. Session Length’, ‘Time on App’, ‘Time on Website’, ‘Length of Membership’]]

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.33, random_state=42)


Training the model:

Now its time to train our model on our training data.

from sklearn.linear_model import LinearRegression



Print out the Coefficients in the model


array([25.70676165, 38.57260842,  0.62520092, 61.71767604])


Predicting the test Data

Now its time to predict the model and lets evaluate its performance

Use lm.predict() to predict on the x_test dataset


Evaluating the model:

Lets evaluate the model performance by calculating the residual sum of squares and explained variance score (r**2)

from sklearn import metricsprint(‘MAE’,metrics.mean_absolute_error(y_test,predictions))

MAE 8.35357352501757
MSE 102.4042865993193
RMSE 10.119500313717042




Lets explore the residuals to make sure that everything was okay

Recreate the Data frame for coefficients:



1 unit increase in avg Session length is associate with $26 more spent

1 unit increase time on App is associate with $38.5 more spent

1 unit increase time on Website is associate with $0.6 more spent

1 unit increase length of membership is associate with $61 more spent

Hence company should focus on website development to catch up Mobile app as mobile app is already doing good.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store