End-to-End Instacart Market Basket Analysis Project

Favour Igwezeke
11 min readFeb 25, 2024

--

Hello! In this article, I am going to share my exploratory data analysis approach for my recent SQL project, "Instacart Market Analysis.”. I will first give some background about Instacart, as well as the task and data itself, followed by my solution.

The tool used for this Analysis is PostgreSQL

About Instacart

Instacart is a grocery delivery and pickup service. Users can select items from local grocery stores through the Instacart app or website and then either have them delivered to their doorstep by a personal shopper or prepared for pickup at the store.

Database Description

The project utilizes the following database tables:

aisles—Contains information about different product categories (aisles).

departments—provides details about various departments within the store.

order_products__prior—includes information about products included in prior customer orders.

order_products__train — Contains details about products in the training set of customer orders.

orders—provides information about individual orders and customers.

products—Contains details about products, including aisle and department IDs.

The link to the dataset for this case study can be sourced from here

Business Questions We’ll Unravel

● What are the top 10 product pairs that are most frequently purchased together?
● What are the top 5 products that are most commonly added to the cart first?
● How many unique products are typically included in a single order?
● Can we categorize customers based on the total amount they’ve spent on orders?
● What are the different customer segments based on purchase frequency?
● How many orders have been placed by each customer?
● What is the distribution of orders placed on different days of the week?
● Are there specific months with higher order volumes?
● Can we identify customers who haven’t placed an order in the last 30 days?
● What percentage of customers have churned in the past quarter?
● What are the top 5 product combinations that are most frequently purchased together?
● Can we find products that are often bought together on weekends vs. weekdays?

Problem Statement and Objectives

The specific objectives of this analysis include:

● Market Basket Analysis: To identify frequently co-occurring products in order to improve store layout and marketing strategies.

● Customer Segmentation: To group customers based on their purchasing behavior for targeted marketing efforts.

● Seasonal Trends Analysis: To identify seasonal patterns in customer behavior and product sales.

● Customer Churn Prediction: To predict which customers are most likely to stop using the service in the near future.

● Product Association Rules: To identify rules or patterns in customer behavior indicating which products are frequently bought together.

Data Quantity

The database contains the following approximate quantities of data:

PRODUCT: 49,755 rows

ORDERS: 1,048,575 rows

DEPARTMENTS: 21 rows

AISLES: 134 rows

ORDER_PRODUCT_PRIOR: 32,434,489 rows

ORDER_PRODUCT_TRAIN: 1,384,617 rows

DATA CLEANING

Missing Values

I began by checking for NULL values in the relevant columns because they can affect our analysis

Result of above Query — Image by Author

The above query reveals that there is no null data in the most relevant columns in our dataset.

Duplicates

The dataset was thoroughly examined, and I found no significant issues, such as duplicates in the dataset.

Data Analysis

Market Basket Analysis

I used SQL queries to conduct market basket analysis of transactional data stored in a relational database like the one presented. Market basket analysis entails matching product associations, computing the frequencies of item sets, and rolling up data to get useful insights. Having product names in the queries improves the readability of the results, as it helps stakeholders easily comprehend which particular products are covered by the observed patterns.

For instance, SQL queries can be made to join relevant tables, such as "Order_products_prior," "Order_products_train," and "Products,” in order to answer the posed questions. These queries should then aggregate the data to calculate frequencies and counts. To have product names in the queries, we will join the Products table to retrieve the names using the product IDs. It allows for analysis results that offer actionable suggestions that are easily understandable to a retailer.

  1. What are the top 10 product pairs that are most frequently purchased together?
Top 10 Paired Products

Insights:

From the results of the query, we have a list of the top 10 product pairs that are most frequently purchased together. They are:
* Bag of Organic Bananas — Organic Hass Avocado
* Bag of Organic Bananas — Organic Strawberries
* Organic Strawberries — Banana
* Banana — Organic Avocado
* Organic Baby Spinach — Banana
* Bag of Organic Bananas — Organic Baby Spinach
* Strawberries — Banana
* Banana — Large Lemon
* Organic Strawberries — Organic Hass Avocado
* Bag of Organic Bananas — Organic Raspberries
So overall “ Bag of Organic Bananas” and “Organic Hass Avocado” are the most frequently purchased product pair

2. What are the top 5 products that are most commonly added to the cart first?

Top 5 Products Commonly added to the Cart First

Insights:

The results indicate that the top five products most commonly added to the cart first are reported as follows: Banana, with an AddToCartCount of 115,521, followed by Bag of Organic Bananas with 82,877, Organic Whole Milk with 32,071, Organic Strawberries with 28,875, and Organic Hass Avocado with 24,913.

3. How many unique products are typically included in a single order?

Average Unique Products per Order — Image by Author

Insights:

The analysis revealed that, on average, there were 10 unique products typically included in a single order.

Customer Segmentation

To conduct the analysis, I utilized SQL queries to extract relevant data from the database, including customer IDs, total spending, purchase frequency, and order counts. I joined multiple tables, such as Orders, Order_products_prior, and Products, to incorporate product names into the analysis. This inclusion allows for better interpretation of the results and facilitates the identification of product combinations and customer preferences.

4. Can we categorize customers based on the total amount they’ve spent on orders?

Customer Segmentation By Spending — Image by Author

Insights

The analysis revealed that customers were categorized into three spending categories. According to the findings, there were 175,532 customers classified as low spenders, 2,945 as high spenders, and 27,732 as medium spenders. It was observed that low spenders comprised the largest segment in terms of the number of customers. Interestingly, despite their larger numbers, low spenders were found to have spent the most on orders compared to high and medium spenders.

5. What are the different customer segments based on purchase frequency?

Customer Segmentation By Purchase Frequency — Image by Author

Insights

The analysis categorized customers into four segments based on their purchase frequency. These segments included Very Frequent customers, Frequent customers, Regular customers, and Occasional customers. It was found that Occasional customers totaled 23,986, Very Frequent customers numbered 53,931, Frequent customers amounted to 56,797, and Regular customers were the highest segment with 71,495 customers. This segmentation provided valuable insights into the distribution of customers based on their purchase frequency.

6. How many orders have been placed by each customer segment?

Customer Segmentation By Total Orders so far — Image by Author

Insights

The analysis segmented customers based on their total order count into four categories: highly active customers, moderately active customers, somewhat active customers, and inactive customers. It was revealed that, from the dataset, the Somewhat active customers had the highest number of total orders. This segmentation sheds light on the distribution of customers according to their level of activity in placing orders.

Seasonal Trends Analysis

To conduct the seasonal trend analysis on the distribution of orders, the analysis involved querying the database to retrieve order data and grouping the orders based on the day of the week they were placed. This allowed for the calculation of the frequency or count of orders placed on each day of the week. Subsequently, visualizations or statistical measures could be used to identify any noticeable patterns or trends in the distribution of orders across different days of the week.

7. What is the distribution of orders placed on different days of the week?

Distribution of orders placed on different days of the week — Image by Author

Insights

The analysis revealed that Sunday had the highest number of orders, totaling approximately 600,905. Following closely behind, Monday accounted for about 587,478 orders, making it the second busiest day. Tuesday also saw significant order volume, with around 467,260 orders placed. These findings indicate that Sunday, Monday, and Tuesday were the top three days of the week in terms of order frequency.

Customer Churn Prediction

To conduct the analysis, i had to query the database to:

  1. Identify customers who haven’t placed an order in the last 30 days by checking their order history.
  2. Calculate the churn rate by comparing the number of churned customers (those inactive for the past quarter) to the total number of active customers.

The analysis involved using SQL queries to filter and aggregate the relevant data from tables such as Orders, Customers, and Order Products.

8. Can we identify customers who haven’t placed an order in the last 30 days?

Number Of Inactive Customers in the last 30 days — Image by Author

Insights

The analysis revealed that, the number of customers who last placed an order in the last 30 days is 152,012.

9. What percentage of customers have churned in the past quarter?

Customer Churn Rate by Total Customers — Image by Author

Insights

The analysis indicates that the customer churn rate, representing the percentage of customers who have discontinued their engagement with our service in the past quarter, stands at 74%.

Product Association Rules

Conducting a product association rules analysis involves examining transactional data to identify the top product combinations that are frequently purchased together. Additionally, it entails exploring whether there are differences in product associations between weekends and weekdays.

10. What are the top 5 product combinations that are most frequently purchased together?

Insights

From the results table the following products combinations were frequently bought together:

  • Bag of Organic Bananas — Organic Hass Avocado which has being bought at about 62,341
  • Bag of Organic Bananas — Organic Strawberries which has being bought at about 61,628 times
  • Organic Strawberries — Banana which has being bought at about 56,156 times
  • Banana — Organic Avocado which has being bought at about 53,395 times
  • Organic Baby Spinach — Banana which has been bought at about 51,395 times

Insights Summary

1. Top Product Pairs:
The top 10 product pairs most frequently purchased together include combinations like Bag of Organic Bananas — Organic Hass Avocado and Bag of Organic Bananas — Organic Strawberries, indicating a preference for organic and fruit-based items.

2. First Cart Additions:
The most commonly added products to the cart first are Banana, Bag of Organic Bananas, Organic Whole Milk, Organic Strawberries, and Organic Hass Avocado, suggesting these items are popular choices for initial purchases.

3. Average Unique Products per Order:
On average, there are 10 unique products included in a single order, indicating customers tend to purchase a variety of items in each transaction.

4. Customer Spending Categories:
Customers are categorized into low spenders, high spenders, and medium spenders. Despite being the largest segment, low spenders contribute the most to overall order value.

5. Customer Segmentation by Purchase Frequency:
Customers are segmented into Very Frequent, Frequent, Regular, and Occasional categories based on purchase frequency. Regular customers represent the highest segment, suggesting a consistent buying pattern among this group.

6. Customer Segmentation by Total Order Count:
Customers are segmented into highly active, moderately active, somewhat active, and inactive categories. Somewhat active customers have the highest total order count, highlighting their engagement with the service.

7. Distribution of Orders by Day:
Sunday, Monday, and Tuesday emerge as the top three days for order placement, indicating higher consumer activity during the start of the week.

8. Recent Customer Activity:
Approximately 152,012 customers have placed orders within the last 30 days, suggesting ongoing engagement with the service.

9. Customer Churn Rate:
The customer churn rate stands at 74%, indicating a significant percentage of customers have discontinued engagement with the service in the past quarter.

10. Frequently Bought Product Combinations:
Products like Bag of Organic Bananas — Organic Hass Avocado and Bag of Organic Bananas — Organic Strawberries are frequently bought together, indicating a preference for organic and fruit-based combinations among customers.

Recommendations

Based on the insights gleaned from the data analysis, I recommend the following strategies for business improvement:

  1. Promotion of Related Products:
    Utilize insights from frequently bought product pairs to strategically position complementary items together, both online and in-store, enhancing the shopping experience and encouraging additional purchases.
  2. Tailored Marketing for Customer Segments:
    Use customer segmentation based on spending habits and purchase frequency to customize marketing efforts. High-spenders may respond favorably to loyalty programs or exclusive deals, while occasional buyers could be incentivized with targeted promotions to boost their engagement.
  3. Seasonal Marketing Strategies:
    Take advantage of seasonal trends by launching targeted marketing campaigns during peak months. For example, offering special promotions during the holiday season can attract customers and drive sales.
  4. Preventing Customer Attrition:
    Identify customers who haven’t made a purchase in the last 30 days and implement personalized outreach strategies to prevent them from disengaging. This could include targeted offers or reminders to re-engage them with your products and services.
  5. Offering Product Bundles and Deals:
    Based on insights from product association analysis, consider creating bundled offers for frequently purchased product combinations. This can increase the overall value of orders and drive sales by providing customers with attractive options.

By implementing these recommendations, your business can enhance customer engagement, increase sales, and improve overall profitability.

Conclusion:

In conclusion, the analysis of customer behavior and purchasing patterns provides valuable insights for optimizing business strategies. By understanding product associations, customer segmentation, and seasonal trends, businesses can enhance marketing efforts, improve product placement, and tailor promotions to target specific customer segments effectively. Additionally, insights into customer churn rates and recent activity highlight areas for customer retention and engagement initiatives. Overall, leveraging these findings can drive business growth, increase customer satisfaction, and ultimately lead to a more successful and competitive operation in the market.

Stay connected with me through Twitter and LinkedIn.

Be sure to follow me here and explore my Medium and Github profile for additional intriguing projects.

--

--

Favour Igwezeke
Favour Igwezeke

Written by Favour Igwezeke

I help build the crossroads of data science, biomedical informatics and art. And enjoy this thing called Life.