Super Market Data Analysis

Sushant Agarwal
Analytics Vidhya
Published in
4 min readMay 8, 2020

The growth of supermarkets in most populated cities are increasing and market competitions are also high. This dataset is one of the historical sales of supermarket company which has recorded in 3 different branches for 3 months data. Predictive data analytics methods are easy to apply with this datasets.

Photo by ev on Unsplash

About Dataset

This is explaining about the attributes that the data of Supermarket sales contains;

Invoice id: Computer generated sales slip invoice identification number

Branch: Branch of supercenter (3 branches are available identified by A, B and C).

City: Location of supercenters

Customer type: Type of customers, recorded by Members for customers using member card and Normal for without member card.

Gender: Gender type of customer

Product line: General item categorization groups — Electronic accessories, Fashion accessories, Food and beverages, Health and beauty, Home and lifestyle, Sports and travel

Unit price: Price of each product in $

Quantity: Number of products purchased by customer

Tax: 5% tax fee for customer buying

Total: Total price including tax

Date: Date of purchase (Record available from January 2019 to March 2019)

Time: Purchase time (10am to 9pm)

Payment: Payment used by customer for purchase (3 methods are available — Cash, Credit card and Ewallet)

COGS: Cost of goods sold

Gross margin percentage: Gross margin percentage

Gross income: Gross income

Rating: Customer stratification rating on their overall shopping experience (On a scale of 1 to 10)

Code Walk-through

Loading the Libraries

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
print(os.listdir("../input"))

Data insight

This shows how the real data looks when it is feed to the algo for predection

The

The is the snap of the data
Info about the attributes of the dataset

Data preparation and cleaning

Now we have to clean the data so that it can be processed by the alogrithm.

We have to convert the ‘date’, ‘day’, ‘month’, ‘year’, ‘Time’ & ‘hour’ to a specified format so that it is easier to understand and process.

Preview the updated data

Below is the sanp of data that is acheived after the data has been processed by our code. This processing is done using a python library — Pandas

This is how the data looks after preprocessing the date and time attribute

Let’s find the number of unique values in columns with object datatype

Now we need to know about the uniues attributed in the data in with we have too work.

The attributes available in the dataset

Now the unique attributes contains some values that aslo need to be processed for the program to understand. Below is the output which shows about the unique values in the attributes

Visualizing the data

By now, we have processed our data and cleaned it as per our requirements. Below are some plots of the data based on the data. They include

  • Gender count
  • Rating based on branch
  • Sale of the Product per hour
  • Monthly insight of the data based on branch and quantity
  • Lastly a monthly summary of the sales per hour of our product
Gender count in the dataset, it shows even distribution so no Biasing is there
Sales by the hour in the company Most of the item were sold around 14:00 hrs local time
In this we can see how each branch’s sales quantity looks like by the hour in a monthly fashion
We can see each branch’s sales by the hour in a monthly fashion

Link to the GITHUB Reposotory

Supermarket Sales Data Analysis

Follow me at

Linkedin : https://www.linkedin.com/in/sushantag9/

Twitter : @sushantag9

--

--

Sushant Agarwal
Analytics Vidhya

My passion is to learn new technologies and apply them. I have an experience of working with Edge devices like Arduino, RaspBerry Pie, NVIDIA Jetson Nano and In