Super Market Analysis simple EDA

Abhijeet Hirekhan
4 min readApr 17, 2020


  • Keywords(Python,Pandas,Numpy,Matplotlib.pyplot,seaborn)
sales = pd.read_csv(r"C:\Users\user\Desktop\portfolio -E\Super Market Analysis simple EDA\supermarket_sales - Sheet1.csv")

sales['date'] = pd.to_datetime(sales['Date'])

sales['date'] = pd.to_datetime(sales['date'])

sales['day'] = (sales['date'])
sales['month'] = (sales['date']).dt.month
sales['year'] = (sales['date']).dt.year

sales['Time'] = pd.to_datetime(sales['Time'])

sales['Hour'] = (sales['Time']).dt.hour    #type(sales['Time'])

Let’s see the unique hours of sales in this dataset

Let’s find the number of unique values in columns with object datatype

categorical_columns = [cname for cname in sales.columns if sales[cname].dtype == "object"]

print("# unique values in Branch: {0}".format(len(sales['Branch'].unique().tolist())))
print("# unique values in City: {0}".format(len(sales['City'].unique().tolist())))
print("# unique values in Customer Type: {0}".format(len(sales['Customer type'].unique().tolist())))
print("# unique values in Gender: {0}".format(len(sales['Gender'].unique().tolist())))
print("# unique values in Product Line: {0}".format(len(sales['Product line'].unique().tolist())))
print("# unique values in Payment: {0}".format(len(sales['Payment'].unique().tolist())))

sns.set(style="darkgrid")       #style the plot background to become a grid
genderCount = sns.countplot(x="Gender", data =sales).set_title("Gender_Count")

sns.boxplot(x="Branch", y = "Rating" ,data =sales).set_title("Ratings by Branch")

Branch B has the lowest rating among all the branches

Sales by the hour in the comapny Most of the item were sold around 14:00 hrs local time

genderCount  = sns.lineplot(x="Hour",  y = 'Quantity',data =sales).set_title("Product Sales per Hour")

Below we can see how each branch’s sales quantity looks like by the hour in a monthly fashion

genderCount  = sns.relplot(x="Hour",  y = 'Quantity', col= 'month' , row= 'Branch', kind="line", hue="Gender", style="Gender", data =sales)

Below we can see each branch’s sales by the hour in a monthly fashion

genderCount  = sns.relplot(x="Hour",  y = 'Total', col= 'month' , row= 'Branch', estimator = None, kind="line", data =sales)

ageDisSpend = sns.lineplot(x="Total", y = "Rating", data =sales)

Product Analysis

Let’s look at the various products’ performance.

sns.boxenplot(y = 'Product line', x = 'Quantity', data=sales )

From the above visual, Health and Beauty,Electronic accessories, Homem and lifestyle, Sports and travel have a better average quantity sales that food and beverages as well as Fashion accessories.

sns.countplot(y = 'Product line', data=sales, order = sales['Product line'].value_counts().index )

From the above image shows the top product line item type sold in the given dataset. Fashion Accessories is the highest while Health and beauty is the lowest

sns.boxenplot(y = 'Product line', x = 'Total', data=sales )

sns.stripplot(y = 'Product line', x = 'Total', hue = 'Gender', data=sales )

sns.relplot(y = 'Product line', x = 'gross income', data=sales )

sns.boxenplot(y = 'Product line', x = 'Rating', data=sales )

Food and Beverages have the highest average rating while sports and travel the lowest

Let’s see when customers buy certain products in the various branches.

productCount  = sns.relplot(x="Hour",  y = 'Quantity', col= 'Product line' , row= 'Branch', estimator = None, kind="line", data =sales)

From the above plots, we can see that food and beverages sales usually high in all three branches at evening especially around 19:00

Payment Channel

Let see how customers make payment in this business

sns.countplot(x="Payment", data =sales).set_title("Payment Channel")

Most of the customers pay through the Ewallet and Cash Payment while under 40 percent of them pay with their credit card. We would also like to see this payment type distribution across all the branches

sns.countplot(x="Payment", hue = "Branch", data =sales).set_title("Payment Channel by Branch")

Customer Analysis

From inspection, there are two types of customers. Members and Normal. Let’s see how many they are and where they are

sns.countplot(x="Customer type", data =sales).set_title("Customer Type")

sns.countplot(x="Customer type", hue = "Branch", data =sales).set_title("Customer Type by Branch")

Do the customer type influence customer rating? Let’s find out

sns.swarmplot(x="Customer type",  y = "Rating",  hue = "City", data =sales).set_title("Customer Type")

With the use of google search, I was able to get the longitude and latitude of each cities. We can

long = {"Yangon": 16.8661, "Naypyitaw": 19.7633, "Mandalay": 21.9588 }
lat = {"Yangon": 96.1951, "Naypyitaw": 96.0785, "Mandalay": 96.0891 }
for set in sales:
sales['long'] = sales['City'].map(long)
sales['lat'] = sales['City'].map(lat)

sns.scatterplot(x="long",  y = "lat",size = "Total",style = "Product line", data =sales, legend = "brief").set_title("Customer Type")

sns.relplot(x="Total",  y = "Quantity", data =sales).set_title("Customer Type")

