Photo by Carlo Martin Alcordo from Pexels

️Exploratory Data Analysis (EDA) of Supermarket Sales 🛍️

using Python, Pandas, Seaborn & Folium

Jeriel Wadjas
4 min readNov 5, 2021

--

The growth of supermarket in the most populated cities in Asia are increasing. The project aims to analyze supermarket sales across different branches and provide insight to understand the customer better. The dataset was taken from Kaggle.

Project Outline

  1. Install and import the required libraries
  2. Download the Dataset
  3. Perform Exploratory Analysis and Visualisation
  4. Ask & Answer Questions about the Data

Installing required libraries

We start by installing the required libraries like Pandas, NumPy, Matplotlib, Seaborn, and Folium

Download the dataset

After Downloading the dataset, we read it using pandas

After reading the data, we preprocess the data.

Data Preprocessing

We found out how many rows and columns and if there is any missing values.

We do not have any missing values.

We also perform additional steps like parsing dates and creating additional columns.

Finding insight

  • City

Naypitaw has the highest number of sales, however, Mandalay and Yangon are not too far from Naypitaw.

  • Month

The supermarket performs well in January. It has a decrease in transactions in February and bounces back in March

  • Quantity

The quantity graph follows a similar pattern to the Sale graph. We have a correlation between the number of products and the number of items sold.

  • Rating

- Branch A has received the most positive rating due to the tapered shape toward the middle between the values 6 to 9.
- Branch B has the most negative rating due to the tapered shape between the values 4 to 6.
- Branch C has almost equal positive and negative ratings between the value 4 to 6 and 8 to 10

  • Payment

Cash is mainly used by customers across the branches.

  • Hour

The Normal customers and the members like to shop around noon but members have the highest number of transactions at 2 pm.
Around 4 pm and 9 pm, the normal customers shop the most.

  • Correlation

-The black bars represent the null values (gross margin percentage vs gross margin percentage)
-The purple represents almost no correlation between the columns
-The orange block represents a high correlation between values. So taxes, Total, and cogs are highly correlated to quantity and unit price
-The pale block represents the perfect correlation between values of the same columns.

After we get insight from the data, we could begin to ask some questions from those insights.

Asking and Answering Questions

Q1: What was the total number of sales? What branch has the highest number of sales?

Q2:What type of product is sold the most?

Q3: What gender buy more items in each category? what is the category?

Men buy more products in 3 categories: Electronic accessories: 86 men, Health and beauty: 88 men, Home and lifestyle: 81 men

Women buy more products in 3 categories: Fashion accessories: 96 women, Food and beverages: 90 women, Sports and travel: 88 women

Q4: How many people buy more than the average price in each category? Are they a member of the supermarket?

The number of people who buy more average price by product line is:

Fashion accessories: 69 people

Food and beverages: 67 people

Home and lifestyle: 66 people

Sports and travel: 75 people

Health and beauty: 60 people

Electronic accessories: 67 people

404 out of 1000 people buy more than the average price

Q5: What is the favorite method of payment of the members? of the normal customers?

Q6: What time should we display an advertisement to maximize the revenue?

Inferences

We have drawn many inferences from the data frame. Here is a summary of a few of them:

  • Branch C that is in Naypyitaw has the highest number of transactions and sales.
  • February has the lowest number of sales and January account for the most sales.
  • The quantity of products is well distributed across the board.
  • The food and beverage category produces the most amount sales.
  • Men purchase more products in 3 categories : Electronics, health and beauty, home and lifestyle.
  • Women purchase more products in 3 categories: Fashion, Food and beverages, and travel.
  • 404 people out 1000 buy more than the average price. The sport and travel category has the most (75)people who buy more than the average price.
  • Cash is the favorite method of payment across customers. The member used a credit card and cash to complete the transaction. The normal customer prefers to use Ewallet and cash.
  • The favorite time to display advertisement is before 13h and 19h

References:

To access the full code, the link to the Git Hub with the Jupyter Notebook is here.

Thank you for reading! if you have any suggestions feel free to reach me on LinkedIn

--

--