Impact of machine learning on Ecommerce

An article talks about the importance of applying predictive analysis and machine learning algorithms on the e-commerce platforms.

mohamed hassan
12 min readFeb 25, 2023

There is no doubt how machine learning plays an important role on our lives and the importance of applying the algorithms helping us to easier our life in all industries and grow the business as prof Dr. Andrew the pioneer of machine learning said “AI is the new electricity.” — Andrew Ng.

product sense is apart from the ML and Data cycle to make recommendation and business impact by understanding the business and how the business actually trying to do which we will talk about it on the article.

on the other hand global e-commerce sales continue rising and take a large piece of a retail pie its expected to end this year with $ 6.3 trillion worldwide the forecast also estimated sales to hit $ 8.1 trillion with 24% from total sales transactions as showing below

source https://www.shopify.com/blog/global-ecommerce-sales

1-Introduction on machine learning

lets have a quick introduction about machine learning types before we go deep on the topics we will keep our discussion on the supervised and unsupervised learning.

supervised learning has two types regression and classification both of them have one concept we have features(x) and target(y) and we want to predict the new target based on the features lets start with regression and its sample equation Y = a X + b where Y is the target(dependent) X is the feature (independent) and a is the intercept (the value of y when x =0) and b is the slope of the line we value the difference between the actual and predicted by the determine the score of equation which called loss function the less the score the better the model accuracy

second part is classification and the target here is to predict discrete output not continues the equation we use called sigmoid function for the logistic regression to limit the output from continuous values to probability of 0,1 if our prediction is binary choice and if our predictions more than two we use one versus all technique and the equation will be SoftMax function the loss function also different from the regression below is the equation of the cross entropy which is the loss function

For the unsupervised learning there are different types but we will talk about the clustering and the difference here we don't have Y(target) we used to find similar instances and group them together inside a dataset we are going to use k means and DBSCAN and both of them are hyperparameter and the target to find the optimal hyperparameters

That was a very quick introduction and most of algorithms will be explained with codes and the use during some of the cases we are going to talk about

2-Marketing and Target Audience

Lets confirm that understanding the customer behaviors on the e-commerce is the first step to win the customer right?

And you cant understand the customer without applying usage patterns by grouping them for example I have customers who order a lot but pay less money lets say average $70 per order and I have customers who order a lot too but they pay more per order lets say not less than $150 daily another group who not ordering frequently but their basket value always high and last group who order frequently with high basket value and this group is the most important group for any organization they are loyal to the platform and guarantee high revenue from the high basket value.

lets see the technique we will apply by the below code

from sklearn.cluster import KMeans
# setting number of clusters then predict and visulize
kmeans = KMeans(n_clusters = 4, random_state = 0)
kmeans_predictions = kmeans.fit_predict(features)

The above technique we mentioned called behavioral segmentations we group the customers based on their purchase behavioral there are other techniques could be followed too like grouping them based on their age , geographic maybe by the gender and that is a very important topic why gender should be known by the company and mandatory for the registration will take about it and a tricky solution could be done, for more info you could reach amazon guide for market segmentation here.

what about campaigns should we follow the same strategy?

lets consider example of a running campaign and was targeted specified number of customers based on their criteria and we had 40% response from this campaign how could we build next campaign based on this so we can generate more revenue and maximize the profit?

will see the below code

# split into train and test
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(features,goal,test_size =0.2,random_state=0)
# train svc
from sklearn.svm import SVC
classifier = SVC(kernel ='rbf', random_state = 0)
classifier.fit(x_train, y_train)
# confusion matrix score
from sklearn.metrics import confusion_matrix
from mlxtend.plotting import plot_confusion_matrix
cm = confusion_matrix(y_test, classifier.predict(x_test))
print(classifier.score(x_test, y_test))
plot_confusion_matrix(cm)

0.95

we built a model to predict the next campaign response based on the last campaign results by understanding the customer behavioral so we could have a view of the results and prediction of the campaign but most important not to focus on the accuracy but the confusion matrix report and the number of mistakes we did (type I & type II errors) by not sending them the campaign and they actually purchased those are the customers we don't want to lose

3- Acquisition vs customer attrition

What is more important acquiring new customer or save your existing one? The success rate of selling to a customer you already have is 60–70%, while the success rate of selling to a new customer is 5–20% , also the cost of the acquisition 5 times

Customer Churn Rate = (Lost Customers ÷ Total Customers at the Start of Time Period) x 100

so its very important to focus on the churn rate and study why the customer churned and most important here is feature engineering which features to choose age , gender , average order size , maybe his average delivery time , customer satisfaction and more to select based on the domain knowledge working in the below code showing steps taken to implement after doing the features selection and pre processing.

# splitting the data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
# training xgboost
from xgboost import XGBClassifier
classifier = XGBClassifier()
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
# make the confusion matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)
[[1526 69]
[ 198 207]]
# apply k-Fold cross validition
from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10)
print("Accuracy: {:.2f} %".format(accuracies.mean()*100))
print("Standard Deviation: {:.2f} %".format(accuracies.std()*100))
Accuracy: 86.19 %
Standard Deviation: 1.01 %

4- Conversion and Impression

conversion rate also called percentage is calculated by dividing the total number of visitors to a selected page by the total number of conversion

Conversion Rate = (Total Number of Visitors on the Website / Total Number of Conversions) x 100

Avg. Sessions = Total Session Duration / Total Number of Session

Lets imagine we had a sudden drop on the conversion rate for specific page of a merchant or a product or even the site there are many scenarios to investigate the reasons right ? but what if the traffic and impression were increased too here could be a complex situation to deal with lets divide users to two group one already made the purchase and the other didn't and we need to investigate on them are the customers similar to these customers who left just short of making a purchase ? If so, what features underlie this similarity ? Where are the customers located, when do they make the purchase, can the non-buyers be nudged to make a purchase if we come to know that some of them spent a while on pages where a discount was running, how many of the people who did not make a purchase this week are existing customers and how many of them are new visitors ?

5-Demand forecasting (Growth)

Before we talk on this part check the video for Dr. Andrew and his speech on TED last year about How AI Could Empower Any Business

Dr. Andrew was identifying AI problems to high values and long tail problems , high value which mostly happen on large tech companies focusing to serve massive numbers of customers like what ads to show to people on the internet , web search engine or online shopping product recommendation system and on the long tail he was focusing on the demand forecasting for the pizza like the situation he mentioned or the demand forecasting for the T-shirt and these unique projects need to be custom built the purpose for applying these type of forecast to predict the volume of the upcoming sales for a certain store or product under the sources of the market type which will generate massive revenue for them.

we will run example on Walmart data one of the leading stores in USA to build the demand of the products and shops the data contains weekly sales of 45 stores that covers sales from 2010–02–05 to 2012–11–01 and other features like holiday flag , temperature , fuel price , consumer price index and unemployment rate as showing below other features could be added depend on the problem they are trying to find the impact of holiday on the sales of the stores.


# snip from the code
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
dataset.Date = pd.to_datetime(dataset.Date) # convert date to datetime
dataset['weekday'] = dataset.Date.dt.weekday # add week column
dataset['month'] = df.Date.dt.month # add month colomn
x = Features
y = dataset['Weekly_Sales'].values
# function to apply evalution matrix between actual value and predicted
def print_evaluate(true, predicted):
mae = metrics.mean_absolute_error(true, predicted)
mse = metrics.mean_squared_error(true, predicted)
rmse = np.sqrt(metrics.mean_squared_error(true, predicted))
r2_square = metrics.r2_score(true, predicted)
print('MAE:', mae)
print('MSE:', mse)
print('RMSE:', rmse)
print('R2 Square', r2_square)
print('__________________________________')
# scaling the features
std = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)
# train linear regression
regressor = LinearRegression()
regressor.fit(x_train, y_train)
pred = regressor.predict(x_test)
test_pred = regressor.predict(x_test)
train_pred = regressor.predict(x_train)
# comparing test score with pred score
print('Test set evaluation:\n_____________________________________')
print_evaluate(y_test, test_pred)
print('====================================')
print('Train set evaluation:\n_____________________________________')
print_evaluate(y_train, train_pred)

comparing different regression algorithms scores to find the less error other models could also done too Arima model for time series forecasting and LSTM model(Deep learning)

6- Fraud

Fraud is a very important topic on the e-commerce it makes huge lose for the company , destroy the finance relation between customers company and the partners or sellers there are several types of payments fraud , partner fraud , customers reviews fraud.

Amazon has developed a robust algorithm to detect fake reviews, that can be both positive or negative.

Fake reviews are positive when bought by the seller and can be negative when bought by the competitors to drop the rating. So, its a complex matrix to detect fake reviews which Amazon is trying to overcome with the help of technology and the team of professionals to manually monitor.

Amazon’s algorithm detect fake reviews by taking into account several factors such as:

  • The language and wording of the reviews can be very similar to each other.
  • Fake review normally won’t have Verified Purchase label.
  • They all fall in a very shot span of time and giving 5 star or 1 star ratings (in case bought by the competitor).
  • Reviews left by the users who consistently leave only critical reviews with lowest rating and/or positive ones with high rating.
  • Excess use of keywords that breaks the natural reading flow.
  • Reviewer’s ranking is very little or low

There can be more factors but these are considered the important ones for the algorithm we will discuss the reviews on the second part.

working on the reviews not less important than payment transactions or detecting orders in fact fake reviews can easily decrease sellers or merchants growth rate and of course the platform growth too.

Fraud cases prediction are imbalanced problem and to detect them we have to check the scores of false negative mostly those who are keeping the fraud and we are not detecting them.

  • Precision = True Positives / (True Positives + False Positives)
  • Recall = True Positives / (True Positives + False Negatives)

7- Hypothesis testing

The purpose of applying hypothesis testing is to evaluate 2 mutual statement on population using a sample before we define the example lets state steps should be taken for any problem

Step 1 State the hypothesis and identify the claim.
Step 2 Find the critical value.
Step 3 Compute the test value and p value.
Step 4 Make the decision to reject or not reject the null hypothesis.

suppose we have 2 dispute about the customers to select them on a campaign or send free vouchers to them in specific city , area ,1st group believe customers with the highest count of orders are more likely to generate revenue over the whole customers lets see if we can accept this claim or not

the second group said that maybe these customers have the highest orders but not necessarily high number of items per order should consider them for the vouchers we need to send to them

Here will make a hypothesis testing

Null Hypothesis H0 is the initial assumption , Alternative Hypothesis H1

Alpha value 0.05 which is in the below image (significant value)

confidence interval 95% (1–0.05)

If the test value falls on the confidence interval area we will accept the hypothesis and if the value falls on the significant value we will reject the hypothesis

The below image showing the distribution and the difference between 1 tail test and 2 tail test

source https://towardsdatascience.com/a-complete-guide-to-hypothesis-testing-2e0279fa9149
from statsmodels.stats.weightstats import ztest as ztest 
import random
rand_list = []
n = 100 # number of samples
for i in range (n):
rand_list.append(random.randint(65,90))
print(rand_list)
p_value = ztest(rand_list,value = 75) # the mean of the populotion
if p_value <= 0.05:
print("reject the H0")
else:
print("accept the H0")

8- Recommendation

Recommendation one of the most important reasons for the the platform revenue the aim is to understand a purchase or behavior for the purchase or using the service and start to recommend similar products related to his previous purchase.

Amazon has been generated over than 35 % sales from its recommendation engine while Netflix 70% from its revenue from their recommending engine too

below table showing some records from movies will build basic recommender using correlation

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

movie_titles = pd.read_csv("Movie_Id_Titles")
# pivot table for watched movies per user
movie_pivot = df.pivot_table(index='user_id',columns='title',values='rating')
# showing the movies with highest count of number of rating
ratings.sort_values('num of ratings',ascending=False).head(10)
# grap rating for top two movies
starwars_user_ratings = movie_pivot['Star Wars (1977)']
liarliar_user_ratings = movie_pivot['Liar Liar (1997)']
# getting movies correlated
similar_to_starwars = movie_pivot.corrwith(starwars_user_ratings)
similar_to_liarliar = movie_pivot.corrwith(liarliar_user_ratings)
# saving result into dataframe
corr_starwars = pd.DataFrame(similar_to_starwars,columns=['Correlation'])
corr_starwars.sort_values('Correlation',ascending=False).head(10)
# joining the correlation with the rating
corr_starwars = corr_starwars.join(ratings['num of ratings'])
# sort the values with num of ratings above 100
corr_starwars[corr_starwars['num of ratings']>100].sort_values('Correlation',ascending=False).head()

Conclusion

The article was just an introduction on how the AI could empower the business of the e-commerce from my point of view and how we could use the algorithms to make a business impact .

--

--