Super Market Data Analysis
The growth of supermarkets in most populated cities are increasing and market competitions are also high. This dataset is one of the historical sales of supermarket company which has recorded in 3 different branches for 3 months data. Predictive data analytics methods are easy to apply with this datasets.
This is explaining about the attributes that the data of Supermarket sales contains;
Invoice id: Computer generated sales slip invoice identification number
Branch: Branch of supercenter (3 branches are available identified by A, B and C).
City: Location of supercenters
Customer type: Type of customers, recorded by Members for customers using member card and Normal for without member card.
Gender: Gender type of customer
Product line: General item categorization groups — Electronic accessories, Fashion accessories, Food and beverages, Health and beauty, Home and lifestyle, Sports and travel
Unit price: Price of each product in $
Quantity: Number of products purchased by customer
Tax: 5% tax fee for customer buying
Total: Total price including tax
Date: Date of purchase (Record available from January 2019 to March 2019)
Time: Purchase time (10am to 9pm)
Payment: Payment used by customer for purchase (3 methods are available — Cash, Credit card and Ewallet)
COGS: Cost of goods sold
Gross margin percentage: Gross margin percentage
Gross income: Gross income
Rating: Customer stratification rating on their overall shopping experience (On a scale of 1 to 10)
Loading the Libraries
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory
This shows how the real data looks when it is feed to the algo for predection
Data preparation and cleaning
Now we have to clean the data so that it can be processed by the alogrithm.
We have to convert the ‘date’, ‘day’, ‘month’, ‘year’, ‘Time’ & ‘hour’ to a specified format so that it is easier to understand and process.
Preview the updated data
Below is the sanp of data that is acheived after the data has been processed by our code. This processing is done using a python library — Pandas
Let’s find the number of unique values in columns with object datatype
Now we need to know about the uniues attributed in the data in with we have too work.
Now the unique attributes contains some values that aslo need to be processed for the program to understand. Below is the output which shows about the unique values in the attributes
Visualizing the data
By now, we have processed our data and cleaned it as per our requirements. Below are some plots of the data based on the data. They include
- Gender count
- Rating based on branch
- Sale of the Product per hour
- Monthly insight of the data based on branch and quantity
- Lastly a monthly summary of the sales per hour of our product