Market Basket Analysis Dashboard in Power BI
One of the key components of Customer Analytics is Market Basket Analysis. Retail businesses use Market Basket Analysis to gain valuable insights into customer purchasing behavior. It helps businesses identify associations between products that are frequently bought together, enabling them to make informed decisions about cross-selling, upselling, and optimizing product placements.
The key question asked in Market Basket Analysis is:
Which products do customers tend to buy together in the same transaction?
And there are several gains to a business when this question is answered. Market Basket Analysis is a very powerful tool and it has many benefits and use cases.
- It aids in optimizing inventory management. A physical retail store can use it to plan its store layout and place the items that are frequently bought together in close proximity.
- An online retailer can recommend products to clients by suggesting the second product when the customer places the first product in the cart.
- It enables a business to understand customer preferences and buying habits better. A retailer can develop cross-selling strategies to sell the products together.
- A store can develop promotion and pricing strategies by offering discounts for buying the products together.
Basket Analysis is not limited to products and retailers only.
How does it work?
Market Basket Analysis works by calculating three key metrics for each product association: support, confidence, and lift.
- Support is the probability of finding a specific combination of products in a transaction. It is calculated as the number of transactions containing the product combination divided by the total number of transactions.
- Confidence measures the likelihood of buying the second product when the first product is already in the basket. It is calculated as the number of transactions containing both products divided by the number of transactions containing the first product.
- Lift indicates how much more likely the second product is to be bought when the first product is already in the basket. It is calculated as the confidence of the association divided by the support of the second product.
In order to understand these 3 key metrics, let’s look at the following 10 transactions from a grocery store.
- Each row represents one transaction
- Each transaction contains two items
- A set of items contained in a transaction is called a basket
We will be using the basket containing Milk and Bread for our analysis. From our dataset, we have 4 transactions containing Milk and 5 transactions containing Bread. And there are 3 transactions containing both Milk and Bread.
Support- It is the easiest to understand of the three metrics. Support indicates the transactions that include the two specific products. It is a measure of how frequently a particular combination of products is purchased together.
To calculate support we first calculate the number of transactions containing both products and the total number of transactions.
SUPPORT = (No. of transactions including both products / Total Transactions)
Support for basket of Milk and Bread = (3/10)*100 = 30%
From the dataset, 30% of the time, Milk and Bread are purchased together. High support means that the that the itemset is commonly purchased together, making it more significant for analysis.
Confidence- It is the % of transactions that contain the two products together, out of the transactions containing one of the two products. It is a measure of the likelihood that the presence of one item in a transaction will lead to the presence of another item in the same transaction.
There are usually 2 confidence numbers: one for each product.
And to calculate confidence for a product, we need to calculate 2 support values.
First we calculate Support for the basket and then we calculate support for each product separately. Then we calculate the Confidence of each product.
CONFIDENCE of Product One = (Support of Basket/Support of Product One)
CONFIDENCE of Product Two= (Support of Basket/Support of Product Two)
Support of basket = 30%
Number of transactions including Milk = 4
Support of Milk = 4/10 = 40%
Number of transactions including Bread = 5
Support of Bread = 5/10 = 50%
Confidence of Milk = 30/40 = 75%
Confidence of Bread = 30/50 = 60%
What do these confidence values mean?
From the dataset, 75% of customers also buy Bread when they add Milk to their basket first. And on the other hand 60% of customers also buy Milk when they add Bread to their basket first.
This means that there is a higher chance for a customer to buy Bread when they have already added Milk to their basket. It shows the direction of cross-selling.
However, confidence in itself does not show us the strength of the relationship between the two products. For example, one product might be a very common purchase causing it to generate a high result for confidence independently from the second product.
Lift- It gives the strength of the relationship between two products in a basket.
LIFT = Support of Basket/(Support of Product one* Support of Product Two)
Support of Basket = 30%
Support of Milk = 40%
Support of Bread = 50%
Lift = 30/(40 * 50) = 1.5
Note: Lift is not calculated as a percentage but as a decimal.
A lift value close to 1 means there is a weak relationship between the two items in a basket. Whereas a lift value greater than 1 indicates a strong relationship between the items in the basket i.e. the two items are frequently bought together and there’s a strong chance of that happening. A lift value of less than 1 indicates that the two products are bought together less frequently.
From the sample dataset, the lift value of Milk and Bread is 1.5 which means that consumers buy them together about 1.5 times more than it would happen by chance. This indicates a strong relationship between Milk and Bread.
The Project
I began the project by getting the data from the source and loading it to Power Query for transformation. I cleaned the data by unpivoting the column necessary based on the structure of the dataset. I then added the conditional columns that would help with the computation of the key metrics. Once I was satisfied that the data was clean I loaded it into Power BI.
On the Data view, I created the necessary Calculated Columns for the above metrics using DAX.
- Ensure that from your dataset you have the list of transactions in one column and a list of the items in another column.
- The first task was to create a list of all the possible combinations of two items from all the items available in the dataset. This will help us perform a basket analysis to identify which items have a strong association with each other.
- Create a unique list of all possible basket combinations.
- Then create Calculated Columns for Support, Confidence and Lift.
I then embarked on visualizing the data.
Findings
From the dataset, milk and chocolate have a good association and a fairly strong relationship. 52% of the customers also buy chocolate when they buy milk first. 50% of customers also buy Milk when they add chocolate to their basket first.
Focusing on a single product in this particular analysis, in this case “yoghurt”, it is frequently purchased together with chocolate than with any other product but the two products do not have a strong relationship compared to the other products.
Yoghurt and bread are bought together less frequently compared to yoghurt with chocolate and kidney beans but the relationship is stronger between yoghurt and bread.
The lift value of Yoghurt and Bread is 1.2 which means that consumers buy them together about 1.2 times more than it would happen by chance.
Recommendations
- When it comes to Product placement the products that should be placed close to Yoghurt are Bread, Nutmeg, Kidney Beans and Onions since they have shown a fairly stronger relationship with Yoghurt than other products.
- The Company should use pricing strategies on these products including issuing discounts from time to time and promote them together to increase sales.
- For their online clients, the Company Should recommend these products to clients who first put Yoghurt in their carts
Challenges
- Sourcing the right dataset for this project was a challenge.
- In a case where you have a huge number of products and numerous transactions, creating the possible combination of the products gives rise to an even bigger dataset to work with.
Dataset is from Kaggle.
Again, the benefits of Basket Analysis to businesses cannot be overemphasized!
Interact with the Power BI Report here.
If you like the project, please clap for it for a wider reach and follow me for more analytics content.
Also, let’s connect. Follow me on LinkedIn