Product Affinity and Basket analysis for an E-commerce website

Amit Bhardwaj
Analytics Vidhya
Published in
6 min readJan 30, 2021

Problem Statement :

1. Provide an overview of the brand’s sales by the following attributes:

Overall — Revenue in the given time period.
Basket — Avg. unique quantity, revenue per order.
Attributes — Time of Day, Day of Week, Geography, Payment Type.
Frequency — How many are single/multiple purchasers? What is the frequency of multi-purchase? Any typical attributes?

2. Product Affinity — Which products are more likely to sell together?

3. Based on your analysis, arrive at a statistical segmentation of the brand’s audience based on Revenue. The number of segments is up to you. Please provide definitions of each group.

Let’s have a look at the data set and try to solve each question one by one . We should be able to first understand and visualize the data set to come towards an intuitive way to solve all of the above.

We can observe in the data set that it’s nested for each order means single row represent a single basket which can have more than one product. Here, we have to think when we have to un-nest the rows and when we have to take the whole basket in our analysis.

By using the explode functionality we can un-nest our data set and then after dropping duplicate rows we can have single rows representing single order which can later be used in segmentation and other calculation.

So now that we know how our data is we can proceed to finding answers.

Overall stats

More Data Articles:

1.5 Reasons not to use Data (And why most are BS)

2.Begineer’s Guide To Data Strategy

3. Chatbot For Recommending Netflix Movies

4. Step by Step Text Analysis using NLTK

From above we can infer what is the average quantity of products in each basket and also the average revenue earned for each basket for our product.

Overall revenue will be the simplest since it would be the sum of revenue earned for the given time period.

Weekly analysis of sales of the products
Hourly Analysis of Sales of the product
Daily Analysis of the sales of the product

We can drilldown from above charts in the month of October

First week has the highest sales, that too in the last day of the week and during the late morning and during early night sales are high relatively.

Sales distribution amongst country province
Sales distribution amongst cities

We can see majority of the sales are from very few cities and country provinces ,also all of the sales are from Australia region.

Frequency of the multiple purchases (52% ) is significant which leads us to our next segment which is Product Affinity .This can be helpful in recommending the related products assuming that users are already purchasing more than one product in single go.

Product Affinity Analysis :

We already have some idea what product affinity means and we are always amazed by the fact how we are always recommended with some other product whenever we add something in our cart/basket.

Overall it goes like this :

Before moving forward let’s clear few terminologies I am going to use in the explanation:

Antecedents- It is described as what is already in the cart.

Consequents- It is described as what can be suggested adding to the cart.

For eg: { Sugar, Salt }=>{Jar} called as rules ,similarly there can be multiple number of rules.

Now to pick the best rules we need to have metrics to decide which one to use which bring us to following three factors:

  1. Confidence : This tell us about how confident we are about rules, if the value is 0 then there was never a transaction where user bought Jar when he/she bought Sugar/Salt. If the value is 1 then every transaction with Salt/Sugar in it has also Jar added to the basket. Generally speaking, higher the confidence higher these items will be bought together .
  2. Support : What if we have high confidence but it occurs rarely , the support of a rule tells us how frequent these rules occurs. It also lies between 0–1. So if support has a value of 0.2 , 20% of our transactions has our combination of Sugar/Salt =>Jar in it. Generally speaking , higher the support more frequent these items will occur in our data-set.
  3. Lift : This measure tells us how our rule is performing overall. A Lift of 1.0 means it performs exactly as same as random chance. A Lift of 2.0 means it performs twice as well as random chance. For eg, A Lift of 1.72 means that if Salt/Sugar are already in the cart then he/she is 172% more likely to add Jar in the cart too. Generally speaking , bigger the Lift number ,higher is the association.

Now that we have rules created and measure to pick best of rules for applying them we should see the standards to pick the best rules

  1. Pick Rules with Highest Lift , at least (10%) 1.10 more than the random chance.
  2. Enough Support (>0.05) which should represent at least 5% of our transaction.
  3. Higher Confidence , this usually depends on the kind of data but higher is better and 1 is the best.

This is the concept used in apriori or any association algorithm for calculating product affinity .Due to its pure statistical data mining behavior and also it is unrelated to any consumer psychology, it is highly efficient.

Statistical Segmentation :

After using K-Means as the approach for segmenting the users based on the revenue per basket. It can be seen that 4 types of users are present from the above elbow diagram.

Cluster 1: Number of unique orders ( avg ~4) ,Revenue (avg ~175).

Cluster 2: Number of unique orders ( avg ~15) ,Revenue (avg ~1539).

Cluster 3: Number of unique orders ( avg ~18) are more but Revenue is less (avg ~3813).

Cluster 4: Number of unique orders ( avg ~8) are more but Revenue is less (avg ~578).

Recommendations :

There can be multiple recommendations for increasing the revenue and overall health of our whole e-commerce platform :

Sales are high, Revenue is low

To increase our revenue streams and gain more customers, market business well and list it in more places like New Zealand where orders are very low. With the advances in technology, we are no longer limited to the physical listing. Listing our place on an online marketplace, a website or on social media can help reach more customers

All users are not returning to the site

To increase the engagement clusters can be used by adding more variables to it. Currently hot clusters (clusters with high revenue) can be analyzed in more depth and those demographics and marketing strategy can be implemented in the cold and warm clusters (clusters with low and less than average revenue).

Thanks!

--

--