Product Recommendation

Mathee Prasertkijaphan
SCB TechX
10 min readAug 21, 2023

--

Have you ever been puzzled about arriving at a shopping center with a fixed budget, only to invariably surpass it, swayed by an irresistible offer? Market basket analysis is the underlying mechanism here, and here’s how it works.

What is Market Basket Analysis (MBA)?

Market Basket Analysis (MBA) is a modeling technique based on the theory that if you buy a certain group of items, you are more (or less) likely to buy another group of items. It’s a way to discover the relationships between products that people buy together. In a retail context, it’s kind of like figuring out, “If a customer has cereal in their shopping cart, how likely are they to also purchase milk?”

Let’s imagine a simple example. If you own a grocery store, you might want to know what products are frequently bought together. This knowledge could help you in several ways:

1) Product Placement: If you find that people who buy chips also often buy salsa, you might want to place these items near each other in your store. This could lead to an increase in sales for both products.

2) Promotion and Discounting: If you find out that people who buy pasta are likely to buy pasta sauce as well, but they only buy pasta sauce when there’s a discount, you might want to offer a deal where if a customer buys pasta, they get a discount on pasta sauce.

3) Cross-selling and Up-selling: When you know that customers who purchase a grill are also likely to buy charcoal, you can recommend charcoal when a customer adds a grill to their cart on your online store (cross-selling). Or you can suggest a more premium charcoal brand (up-selling).

To analyze such product relationships, we use Market Basket Analysis. The idea is to look at thousands or millions of transactions and figure out the combinations of products that occur more frequently than expected. The end goal is to use these patterns to develop marketing or business strategies that help boost sales and customer satisfaction. Commonly used methods for performing Market Basket Analysis include association rule learning algorithms like Apriori and FP-Growth. These algorithms are designed to find relationships or patterns in large datasets. So, Market Basket Analysis is a powerful tool in the marketer’s toolbox, helping to increase sales by better understanding customer buying behavior.

Data sources

Market Basket Analysis (MBA) can draw on a variety of data sources to identify shopping patterns and customer behavior. Here’s a simpler explanation of some systems that provide data for MBA:

1)Cash Registers or Checkout Systems: These systems are used in physical stores, and they keep a record of what customers buy. We can use these records to find patterns in what people often buy together.

2)Online Shopping Platforms: Platforms like Lazada, Shopee, Central Online, and Pomelo are like online cash registers. They track what customers are buying online. This information is useful for finding patterns in customer behavior and tailoring product suggestions.

Remember, the main aim of using these data sources for Market Basket Analysis is to better understand our customers’ buying behavior, improve product placement, and make smarter suggestions to customers, which ultimately drives more sales.

Let’s break down the important terms related to Market Basket Analysis:

1)Items: Items refer to the entities we’re trying to find connections between them. If you’re an online store, an item would be any product you’re selling. If you’re a content creator, an item could be an article, a blog post, a video, and so on. When you group several items, it’s called an ‘item set’. To illustrate, ‘item set’ could be {product1, product2, product3, …, productN}.

2)Transactions: Transactions represent instances where a group of items are bought or consumed together. For an online store, a transaction typically means a customer’s single purchase. For a content creator, a transaction might be the collection of articles read by a user in one website visit. (The analyst gets to decide the time frame considered for a transaction.) Each transaction is made up of an item set. So, a ‘transaction’ could look like {item1, item2, …, itemK}.

3)Rules: In Market Basket Analysis, rules are statements that express the following format: {item1, item2, …} ⇒ {itemK}. What this means is that if a customer buys or engages with the items on the left-hand side (LHS) of the rule (i.e., {item1, item2, …}), they are likely also interested in the item on the right-hand side (RHS), i.e., {itemK}.

An example of a rule could be drawing from a food context, where sandwiches and cookies (LHS) lead to a purchase of a drink (RHS). This rule indicates that customers who buy sandwiches and cookies are likely to also buy a drink.

Principle Concept

Association Rule Mining:

Imagine you’re a marketing manager at a supermarket and you want to understand the shopping habits of your customers. Specifically, you want to know which products are often bought together. This is where Association Rule Mining comes in. It’s like observing your customers and taking notes: “Interesting, a lot of people who buy pasta also buy tomato sauce”, “Those who buy wine often buy cheese too”, etc. These observations help you make strategic decisions that benefit your business, such as grouping certain items or offering bundled promotions.

Support:

Now, let’s consider ‘support’. Let’s take the example of pasta and tomato sauce. If you have 1,000 shopping receipts and pasta and tomato sauce are bought together in 200 of them, the support of this combination (pasta and tomato sauce) is 200/1000 = 20%.

In other words, ‘support’ is the percentage of total transactions in which a particular combination of items appears. The higher the support, the more common the combination.

Confidence:

Confidence helps us estimate how reliable our observation or ‘rule’ is. Let’s say you made a rule saying, “If a customer buys pasta, they’ll also buy tomato sauce”. Confidence tells you how often that rule holds true. So, if out of 500 receipts where pasta was purchased, tomato sauce was also bought 200 times, then the confidence of the rule is 200/500 = 40%. Meaning, 40% of the time when a customer bought pasta, they also bought tomato sauce.

Lift:

The Lift takes this one step further. It tells you whether the likelihood of buying tomato sauce increases when a customer buys pasta, compared to the likelihood of buying tomato sauce overall. So, if tomato sauce is bought 20% of all transactions, and the confidence of pasta and tomato is bought in 40% of the time, then the lift is 40% / 20% = 2.

A lift of 1 means that pasta does not affect the purchase of tomato sauce. If it’s greater than 1 (like in our case), pasta encourages the purchase of tomato sauce. If it’s less than 1, pasta discourages the purchase of tomato sauce.

In summary, with Association Rule Mining, you can identify patterns in customer shopping behavior (the ‘rules’), measure how often these patterns occur (the ‘support’), how reliable these rules are (the ‘confidence’), and how much purchasing one product influences the purchase of another (the ‘lift’).

These concepts can guide your marketing strategies, allowing you to target customers more effectively, increase sales, and improve customer satisfaction.

Steps to implement

Step 1: Import Library

Step 2: Import Dataset and Cleaning Data

This dataset is randomly generated and is not intended for any commercial purpose. It does not represent actual consumer behavior or purchasing patterns from a real business or industry. Its primary use is for academic purposes, such as research or educational projects, where it can provide a valuable basis for learning and exploring data analysis techniques.

Step 3: Generate frequent item sets that have a support of at least 20% (This number can be adjusted based on your specific needs.)

Step4: Generate the rules with their corresponding support, confidence, and lift

Each row of the table represents an association rule. Here’s a breakdown of what each column in the table means:

1. antecedents: This column contains the item(s) that form the basis of the association rule. In an IF-THEN statement, these are the IF part.

2. consequents: This column contains the item(s) that are associated with the antecedent(s). In an IF-THEN statement, these are the THEN part.

3. antecedent support: The proportion of transactions in the dataset that contain the antecedent(s).

4. consequent support: The proportion of transactions in the dataset that contain the consequent(s).

5. support: The proportion of transactions in the dataset that contain both the antecedent(s) and consequent(s). This is the joint probability of the antecedent and consequent.

6. confidence: The conditional probability of the consequent given the antecedent. It’s the likelihood that the consequent is bought if the antecedent is bought.

7. lift The ratio of the observed support to what the support would be if the antecedent and consequent were independent. A lift value greater than 1 means that the antecedent and consequent are likely to be bought together, while a value less than 1 suggests they’re unlikely to be bought together.

8. leverage: The difference between the observed frequency of the antecedent and consequent appearing together and the frequency that would be expected if they were independent. A leverage value of 0 indicates independence.

9. conviction: A measure of the implication of the rule. A high conviction value means that the consequence is highly dependent on the antecedent. A conviction value of 1.0 means that the rule is perfectly reliable.

10. zhangs_metric: A measure of the rule’s interestingness. It ranges from -1 to +1, where +1 indicates the maximum positive correlation, -1 indicates the maximum negative correlation, and 0 indicates independence.

So, for example, in the first row:

The antecedent is เครื่องชงกาแฟแคปซูล and the consequent is แก้วเก็บความ.

The support is 0.297872, meaning that both items appear together in about 29.8% of all transactions.

The confidence is 0.875, meaning that in about 87.5% of the transactions where a capsule coffee machine is bought, a tumbler is also bought.

The lift is 1.246212, which is greater than 1. This suggests that the two items are likely to be bought together more often than would be expected if they were bought independently.

The leverage is 0.058850, indicating that the two items appear together more often than would be expected if they were independent.

The conviction is 2.382979, which is greater than 1. This suggests that the rule is quite reliable, as the consequence is highly dependent on the antecedent.

Step 5: Filter the data frame using standard pandas code. For instance, the below line of code will give you all the rules for which the lift is greater than 1 and the confidence is greater than 0.5

Step 6: Calculates an item-item matrix where each cell represents the cosine similarity between two items (products). The similarity between two items is calculated by taking their respective vectors (their ratings by users) and calculating the cosine of the angle between them. This results in a value between -1 and 1, with 1 being highly similar and -1 being completely dissimilar. A value of 0 would indicate no correlation.

Step 7: Transform the matrix into a three-column table (‘item’, ‘related item’, ‘value’), where each row represents two items and the similarity score between them.

Summary

1. Cat products: Items related to cat care, namely ‘ทรายแมว’ and ‘น้ำพุแมว’, have a high similarity value of 0.912871. This indicates that customers who buy one of these items are very likely to buy the other. We could cross-sell these products by recommending the Cat Water Fountain to customers who are buying the Cat Sand and vice versa.

2. Salmon Sashimi and Medicine: There is a strong association between ‘Salmon Sashimi’ and ‘ยาดม’. Therefore, customers buying Salmon Sashimi might be interested in this type of medicine, and vice versa. This could be due to dietary or health-related reasons.

3. Tech Products: We see high association values among ‘Bluetooth Speaker’, ‘External Hard Disk’, and ‘Salmon Sashimi’. This suggests that these products are often purchased together. It might be the case that our customer segment is tech-savvy and enjoys high-quality audio, computing, and gourmet food. Therefore, suggesting a Bluetooth Speaker or External Hard Disk to customers who are buying Salmon Sashimi, and vice versa, could be effective.

4. Home & Health: There is a significant association between ‘ยาดม’ (Medicine), ‘พลาสเตอร์บรรเทาปวด ตราเสือ’ (Pain relief plaster), and ‘แก้วเก็บความเย็น’ (Thermal mug). It suggests that our customers might be interested in health products and comfort in their daily routine.

5. Gourmet Food: ‘Salmon Sashimi’ and ‘ขนมจีนน้ำยาปู’ (Crab noodles) have a similarity value of 0.773111, suggesting that customers who are interested in one might be interested in the other.

--

--