# Linear Regression on Ecommerce Customer Dataset

# Project Overview:

## You got some contract work with E commerce company in New York City that sells clothing online

## The company is trying to decide whether to focus on their efforts on their mobile app or their website

**Lets begin — — — — — — — — ->>>>>>>>>>>>>**

## Imports packages

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

**Get the Data:**

Read the ecommerce data from csv file

customers=pd.read_csv(“Ecommerce Customers.csv”)

customers.head()

**Data Analysis:**

customers.describe()

customers.info()

**Exploratory Data Analysis:**

Lets explore the data to find the relationship between the features

# comparing time on Website and Yearly Amount Spent

sns.jointplot(data=customers,x=’Time on Website’,y=’Yearly Amount Spent’)

# comparing time on App and Yearly Amount Spent

sns.jointplot(data=customers,x=’Time on App’,y=’Yearly Amount Spent’)

# Comparing the co-relation between entire features in E-commerce data

sns.pairplot(customers)

**Training and Testing the Data**

Now we have explored the data a bit,lets go ahead and split the data into training and testing sets.

Y= ‘Yearly Amount Spent’ is a dependent variables

x=‘Avg. Session Length’, ‘Time on App’, ‘Time on Website’, ‘Length of Membership’ are independent variable

=====================================

y=customers[‘Yearly Amount Spent’]

x=customers[[‘Avg. Session Length’, ‘Time on App’, ‘Time on Website’, ‘Length of Membership’]]

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.33, random_state=42)

==================================

**Training the model:**

Now its time to train our model on our training data.

from sklearn.linear_model import LinearRegression

lm=LinearRegression()

lm.fit(x_train,y_train)

======================================

**Print out the Coefficients in the model**

lm.coef_

`array([25.70676165, 38.57260842, 0.62520092, 61.71767604])`

======================================

**Predicting the test Data**

Now its time to predict the model and lets evaluate its performance

Use lm.predict() to predict on the x_test dataset

predictions=lm.predict(x_test)

**Evaluating the model:**

Lets evaluate the model performance by calculating the residual sum of squares and explained variance score (r**2)

from sklearn import metrics

print(‘MAE’,metrics.mean_absolute_error(y_test,predictions))

print(‘MSE’,metrics.mean_squared_error(y_test,predictions))

print(‘RMSE’,np.sqrt(metrics.mean_squared_error(y_test,predictions)))

`MAE 8.35357352501757`

MSE 102.4042865993193

RMSE 10.119500313717042

metrics.explained_variance_score(y_test,predictions)

`0.9814710935431786`

**Residuals:**

Lets explore the residuals to make sure that everything was okay

**Recreate the Data frame for coefficients:**

cdf=pd.DataFrame(lm.coef_,x.columns,columns=[‘Coeff’])

cdf

**Conclusion:**

1 unit increase in avg Session length is associate with $26 more spent

1 unit increase time on App is associate with $38.5 more spent

1 unit increase time on Website is associate with $0.6 more spent

1 unit increase length of membership is associate with $61 more spent

Hence company should focus on website development to catch up Mobile app as mobile app is already doing good.