Navigating the Grocery Store Maze: A Comprehensive Exploration through Market Basket Analysis

Ali Sakhi Khan
7 min readJan 5, 2024

--

Photo by Franki Chamaki on Unsplash

I’ve always found it interesting how stores like HEB and Whole Foods set up their layouts. It’s like a puzzle — everything is arranged so you can easily find what you need. But what’s even more fascinating is how I often end up buying more than I planned. You know, like grabbing a box of cookies because they’re right next to the milk.

My curiosity grew further when I learned about Market Basket Analysis in my Marketing Analytics class. It’s all about understanding which products are often bought together, helping stores decide where to put things and what to promote. Motivated to put this concept to the test, I aimed to determine if I could determine the layout of a store using grocery purchase data I found on Kaggle.

Choosing the Dataset:

Opting for a dataset with only three columns — Member_number, Date, and itemDescription — was a deliberate decision. My aim was two-fold: to challenge myself in extracting comprehensive insights from minimal features and to maintain a singular focus on mastering Market Basket Analysis without succumbing to distractions. Once I mastered this, I could tackle more complex subjects.

Dataset Exploration:

The dataset explored the purchases of 3898 unique customers between January 1st 2014 and December 30th 2015. I noticed many customers had separate daily transactions. To make sense of it, I combined each person’s daily shopping into one row. This made it easy to find the top 20 customers based on how often they shopped.

Understanding Shopping Trends:

A deeper dive into the dataset revealed the most-shopped items, turns out whole milk was the most popular followed by vegetables. This made sense given the perishable nature of these products and the necessity for more frequent purchases.

Unveiling Holiday Shopping Trends:

Turning my attention to the top 10 customers, I wanted to see if there were patterns in their purchases, store visit days, and preferred products. Incorporating a package to integrate the Georgian calendar and US holidays into my dataframe, I looked to see if I could find interesting trends.

The X on the lines on the graph on the right denotes holiday purchases.

Examining holiday purchases didn’t show a clear trend. Member numbers 1087, 1052, 1004, and 1051 all shopped on Columbus Day. Member numbers 1052 and 1004 both bought whole milk, while member number 1004 and 1098, though they shopped on different holidays, bought rolls/buns. This could be purely coincidental, and there isn’t much we can deduce from here.

I wanted to delve further into purchases made on holidays. For starters, Columbus Day saw the most number of purchases from customers with over 45 visits followed by Labor Day with over 40 visits. One of the reasons behind so many purchases on Labor Day could be attributed to the fact that the day is known for it sales offers, maybe people were visiting the grocery store for promotions?

However, when the purchases are analyzed and the most popular items purchase on the day of is observed, it suggests that people were most likely shopping for snacks and food. May be because they were hosting friends and families or going out to celebrate. Without information about promotions offered by the grocery store, it is difficult to tell if promotions encouraged them to visit the store.

Pre-Holiday Shopping Insights:

My interest piqued, leading me to investigate pre-holiday shopping patterns. Contrary to expectations (I was expecting Thanksgiving to be the most popular), Veterans Day emerged as the pre-holiday shopping hotspot with over 50 visits, followed closely by Columbus Day with just under 50 visits.

The shopping lists predominantly featured vegetables and whole milk, unveiling strategic insights for the grocery store’s supply chain management. The team can use this to plan their inventory, ensuring that they have enough of the popular items available on the day of and the day before the holiday to tend to customer demand and maximize sales.

Market Basket Analysis:

Carrying out the market Basket Analysis was straight forward. By using the apriori package, I was able to determine the association rules between products along with the Support, Confidence and Lift. I decided to use the Lift values to determine the store layout because it comprehensively considers all the items being assessed. Based on the Lift values, I came up with the following store layout.

Suggested Store Layout Using Lift Values

The goal was to not only increase sales by placing products with high lifts near each other but to also enhance the shopping experience where it is convenient for them to find the products.

In addition to Lift values, I also considered the most frequently bought items to position the various aisles. Given that whole milk was the most bought item with over 2300 purchases followed by vegetables with over 1800 purchases, I placed them towards the very end so that when customers enter the grocery store, they would have to make their way through most of the aisles and items better they get to the most popular products. This would encourage them to buy other items like breads, fruits, or beverages.

The store layout was centered around the dairy section, where I tried to arrange each aisle such that the products stored in them shared a strong lift with the dairy section and other relevant sections.

Yogurt/whole milk share the highest lift with sausage and so, they were kept next to each other. Similarly, pastry and napkin shared a relatively high lift and so, they were kept adjacent to each other.

I had to be strategic when designing the center of the store. Citrus fruit and specialty chocolates shared the highest lift in this section and so I put them next to each other. I also observed that shoppers often bought flour with tropical fruits and so I placed them opposite each other. I placed the alcoholic drinks aisle next to the bread aisle because of the lift of 1.36 shared between canned beer and brown bread.

Lastly, I considered items with the lowest lifts, positioning them such that they were at least on the same side if not next to the product that they’re associated with. Rolls and buns were placed with seasonal items, while beef and newspapers were arranged to capture shoppers’ attention upon exiting the store.

Clustering and PCA:

Given that we live in the day and age of social media, I also wanted to take a look to see how I could cluster customers based on their purchase patterns so that I could recommend products to customers shopping online. I took the help of ChatGPT to create 8 categories of shopped items that included items listed in the itemDescription column and beyond so that we can easily scale it if we were to apply this to another grocery dataset.

The categories were applied to each customer’s purchase history and by employing K-Means clustering, I sought to categorize customers into distinct groups. While the elbow plot recommended three clusters as optimal, practical application revealed that it failed to capture nuanced purchase behaviors, resulting in significant overlaps. To enhance precision, I opted for five clusters, a choice that more accurately defined customer segments and their unique purchase patterns.In dissecting each customer’s purchase history, I applied categories based on product types.

Visualizing the clusters using PCA (Principal Component Analysis) explained the diverse preferences within each cluster. Take Cluster 0, for instance — a group demonstrating a penchant for dairy and household items, along with a keen interest in fresh produce while steering clear of frozen products.

These insights carry actionable value, especially in online retail. The analysis allows us to make tailored recommendations for customers belonging to specific clusters. For instance, advertising fresh produce, dairy, and household products to Cluster 0 customers shopping online, and strategically cross-selling items with high Lift values. This approach transforms data-driven insights into targeted, personalized recommendations, fostering a more engaging online shopping experience.

Next Steps:

Scalability: Retail stores can adopt these methods to optimize their physical store layouts and enhance their apps, offering customers personalized, data-driven suggestions and exclusive promotional offers.

Opportunity for further analysis: With the addition of more customer information, such as name, email, address, age, profession, and spending habits, the potential for sophisticated email marketing and tailored promotions expands. Furthermore, incorporating these details allows the categorization of new customers into clusters, facilitating the provision of product recommendations based on shared features with existing customers — a pivotal step in fostering customer loyalty and driving sales.

Weblinks:

Dataset: https://www.kaggle.com/datasets/heeraldedhia/groceries-dataset/data

Github: https://github.com/alisakhikhan/Market-Basket-Analysis/tree/main

--

--

Ali Sakhi Khan
0 Followers

Follow me as I venture into the captivating realm of Data Science, a newfound realm that has recently captivated my interest