Analytics Vidhya
Published in

Analytics Vidhya

Super Market Data Analysis

The growth of supermarkets in most populated cities are increasing and market competitions are also high. This dataset is one of the historical sales of supermarket company which has recorded in 3 different branches for 3 months data. Predictive data analytics methods are easy to apply with this datasets.

About Dataset

This is explaining about the attributes that the data of Supermarket sales contains;

Invoice id: Computer generated sales slip invoice identification number

Branch: Branch of supercenter (3 branches are available identified by A, B and C).

City: Location of supercenters

Customer type: Type of customers, recorded by Members for customers using member card and Normal for without member card.

Gender: Gender type of customer

Product line: General item categorization groups — Electronic accessories, Fashion accessories, Food and beverages, Health and beauty, Home and lifestyle, Sports and travel

Unit price: Price of each product in $

Quantity: Number of products purchased by customer

Tax: 5% tax fee for customer buying

Total: Total price including tax

Date: Date of purchase (Record available from January 2019 to March 2019)

Time: Purchase time (10am to 9pm)

Payment: Payment used by customer for purchase (3 methods are available — Cash, Credit card and Ewallet)

COGS: Cost of goods sold

Gross margin percentage: Gross margin percentage

Gross income: Gross income

Rating: Customer stratification rating on their overall shopping experience (On a scale of 1 to 10)

Code Walk-through

Loading the Libraries

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os

Data insight

This shows how the real data looks when it is feed to the algo for predection


Data preparation and cleaning

Now we have to clean the data so that it can be processed by the alogrithm.

We have to convert the ‘date’, ‘day’, ‘month’, ‘year’, ‘Time’ & ‘hour’ to a specified format so that it is easier to understand and process.

Preview the updated data

Below is the sanp of data that is acheived after the data has been processed by our code. This processing is done using a python library — Pandas

Let’s find the number of unique values in columns with object datatype

Now we need to know about the uniues attributed in the data in with we have too work.

Now the unique attributes contains some values that aslo need to be processed for the program to understand. Below is the output which shows about the unique values in the attributes

Visualizing the data

By now, we have processed our data and cleaned it as per our requirements. Below are some plots of the data based on the data. They include

  • Gender count
  • Rating based on branch
  • Sale of the Product per hour
  • Monthly insight of the data based on branch and quantity
  • Lastly a monthly summary of the sales per hour of our product

Link to the GITHUB Reposotory

Supermarket Sales Data Analysis

Follow me at

Linkedin :

Twitter : @sushantag9



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sushant Agarwal

My passion is to learn new technologies and apply them. I have an experience of working with Edge devices like Arduino, RaspBerry Pie, NVIDIA Jetson Nano and In