Exploring Food Purchasing Patterns: A Similarity Analysis

Larissakimberly
INST414: Data Science Techniques
4 min readMar 12, 2024
  • Introduction:

In today’s data-driven world, businesses seek insights into consumer behavior to optimize their strategies. One intriguing question is whether there’s a similarity between foods that are highly bought at the same hours. Answering this question can provide valuable insights for retailers in terms of product placement, marketing strategies, and inventory management.

  • Stakeholder and Decision:

The stakeholders interested in this question include grocery store owners, retail analysts, and marketing teams. By understanding the similarity between foods purchased at the same hours, retailers can tailor their offerings, promotions, and store layouts to better meet customer preferences and optimize sales.

  • Softwares used:

The analysis was facilitated by several software tools. pandas was crucial for data manipulation and organization, allowing the creation of a pivot table structured by department and hour of the day. scikit-learn enabled the computation of cosine similarity between departments based on order hour frequency. This involved using the cosine_similarity function from the sklearn.metrics.pairwise module. Additionally, the MinMaxScaler class from sklearn.preprocessing normalized the data using Min-Max scaling for consistency in feature ranges. For visualization, seaborn and matplotlib.pyplot were used to generate a heatmap of the similarity matrix, providing an intuitive representation of departmental relationships based on order hour frequencies. These software tools seamlessly integrated to explore the dataset, yielding valuable insights into departmental similarities.

  • Data Description:

The dataset “InstacartOrdersByDepartment.csv” contains information about the number of orders for different departments at various hours of the day. Each entry includes the department name, the hour of the day, the number of orders placed for that department during that hour, and the total orders for the department.

  • Data Collection:

The dataset was obtained from an Instacart database, which collects data on customer orders from various departments. It provides a comprehensive overview of purchasing patterns across different times of the day.

  • Data Cleaning:

Common data cleaning steps included handling missing values and ensuring consistency in department names. Some departments may have had variations in naming conventions, requiring standardization for accurate analysis.

  • Similarity Measurement:

To measure similarity between foods bought at the same hours, I’ll use the cosine similarity metric. I’ll consider each hour of the day as a feature vector representing the number of orders for each department during that hour. Cosine similarity will then quantify the similarity between these feature vectors, indicating how closely related the purchasing patterns of different departments are at the same hours.

Computing cosine similarity
  • Figure/Table:
  • Top 10 Most Similar Items for each department:

1. Query: Alcohol
- Departments with Similar Order Hour Frequencies:
1. deli
2. snacks
3. dry goods pasta
4. meat seafood
5. bakery
6. pantry
7. frozen
8. produce
9. dairy eggs
10. missing

2. Query: Bakery
-Departments with Similar Order Hour Frequencies:
1. dairy eggs
2. snacks
3. dry goods pasta
4. breakfast
5. meat seafood
6. produce
7. deli
8. pantry
9. canned goods
10. frozen

3. Query: Beverages
- Departments with Similar Order Hour Frequencies:
1. breakfast
2. dairy eggs
3. canned goods
4. pantry
5. personal care
6. bakery
7. frozen
8. snacks
9. dry goods pasta
10. meat seafood

  • Analysis:

The analysis reveals that certain departments, such as snacks, produce, dairy eggs, and bakery, tend to have similar purchasing patterns at specific hours. This suggests that customers may buy complementary items together, indicating opportunities for cross-promotion and bundling strategies.

  • Limitations:

- The analysis is based solely on Instacart data and may not generalize to other grocery retailers.
- The dataset does not provide information on individual products, limiting the granularity of the analysis.
- Cosine similarity may overlook nonlinear relationships between departments.

  • Conclusion:

In conclusion, exploring food purchasing patterns using similarity analysis offers valuable insights for retailers seeking to optimize their strategies. By identifying similarities between departments in terms of hourly purchasing patterns, retailers can tailor their offerings and marketing efforts to better meet customer needs and preferences.

--

--