Super Market Analysis simple EDA
- EDA ON SUPER MARKET
- Keywords(Python,Pandas,Numpy,Matplotlib.pyplot,seaborn)
sales = pd.read_csv(r"C:\Users\user\Desktop\portfolio -E\Super Market Analysis simple EDA\supermarket_sales - Sheet1.csv")
In [ ]:
sales.head()
In [ ]:
sales.info()
In [ ]:
sales['date'] = pd.to_datetime(sales['Date'])
In [ ]:
sales['date'].dtype
In [ ]:
type(sales['date'])
In [ ]:
sales['date'] = pd.to_datetime(sales['date'])
In [ ]:
sales['day'] = (sales['date']).dt.day
sales['month'] = (sales['date']).dt.month
sales['year'] = (sales['date']).dt.year
In [ ]:
sales['Time'] = pd.to_datetime(sales['Time'])
In [ ]:
sales['Hour'] = (sales['Time']).dt.hour #type(sales['Time'])
Let’s see the unique hours of sales in this dataset
In [ ]:
sales['Hour'].unique()
In [ ]:
sales.describe()
Let’s find the number of unique values in columns with object datatype
In [ ]:
categorical_columns = [cname for cname in sales.columns if sales[cname].dtype == "object"]
In [ ]:
categorical_columns
In [ ]:
print("# unique values in Branch: {0}".format(len(sales['Branch'].unique().tolist())))
print("# unique values in City: {0}".format(len(sales['City'].unique().tolist())))
print("# unique values in Customer Type: {0}".format(len(sales['Customer type'].unique().tolist())))
print("# unique values in Gender: {0}".format(len(sales['Gender'].unique().tolist())))
print("# unique values in Product Line: {0}".format(len(sales['Product line'].unique().tolist())))
print("# unique values in Payment: {0}".format(len(sales['Payment'].unique().tolist())))
In [ ]:
sns.set(style="darkgrid") #style the plot background to become a grid
genderCount = sns.countplot(x="Gender", data =sales).set_title("Gender_Count")
In [ ]:
sns.boxplot(x="Branch", y = "Rating" ,data =sales).set_title("Ratings by Branch")
Branch B has the lowest rating among all the branches
Sales by the hour in the comapny Most of the item were sold around 14:00 hrs local time
In [ ]:
genderCount = sns.lineplot(x="Hour", y = 'Quantity',data =sales).set_title("Product Sales per Hour")
Below we can see how each branch’s sales quantity looks like by the hour in a monthly fashion
In [ ]:
genderCount = sns.relplot(x="Hour", y = 'Quantity', col= 'month' , row= 'Branch', kind="line", hue="Gender", style="Gender", data =sales)
Below we can see each branch’s sales by the hour in a monthly fashion
In [ ]:
genderCount = sns.relplot(x="Hour", y = 'Total', col= 'month' , row= 'Branch', estimator = None, kind="line", data =sales)
In [ ]:
sales['Rating'].unique()
In [ ]:
ageDisSpend = sns.lineplot(x="Total", y = "Rating", data =sales)
Product Analysis
Let’s look at the various products’ performance.
In [ ]:
sns.boxenplot(y = 'Product line', x = 'Quantity', data=sales )
From the above visual, Health and Beauty,Electronic accessories, Homem and lifestyle, Sports and travel have a better average quantity sales that food and beverages as well as Fashion accessories.
In [ ]:
sns.countplot(y = 'Product line', data=sales, order = sales['Product line'].value_counts().index )
From the above image shows the top product line item type sold in the given dataset. Fashion Accessories is the highest while Health and beauty is the lowest
In [ ]:
sns.boxenplot(y = 'Product line', x = 'Total', data=sales )
In [ ]:
sns.stripplot(y = 'Product line', x = 'Total', hue = 'Gender', data=sales )
In [ ]:
sns.relplot(y = 'Product line', x = 'gross income', data=sales )
In [ ]:
sns.boxenplot(y = 'Product line', x = 'Rating', data=sales )
Food and Beverages have the highest average rating while sports and travel the lowest
Let’s see when customers buy certain products in the various branches.
In [ ]:
productCount = sns.relplot(x="Hour", y = 'Quantity', col= 'Product line' , row= 'Branch', estimator = None, kind="line", data =sales)
From the above plots, we can see that food and beverages sales usually high in all three branches at evening especially around 19:00
Payment Channel
Let see how customers make payment in this business
In [ ]:
sns.countplot(x="Payment", data =sales).set_title("Payment Channel")
Most of the customers pay through the Ewallet and Cash Payment while under 40 percent of them pay with their credit card. We would also like to see this payment type distribution across all the branches
In [ ]:
sns.countplot(x="Payment", hue = "Branch", data =sales).set_title("Payment Channel by Branch")
Customer Analysis
From inspection, there are two types of customers. Members and Normal. Let’s see how many they are and where they are
In [ ]:
sns.countplot(x="Customer type", data =sales).set_title("Customer Type")
In [ ]:
sns.countplot(x="Customer type", hue = "Branch", data =sales).set_title("Customer Type by Branch")
Do the customer type influence customer rating? Let’s find out
In [ ]:
sns.swarmplot(x="Customer type", y = "Rating", hue = "City", data =sales).set_title("Customer Type")
With the use of google search, I was able to get the longitude and latitude of each cities. We can
In [ ]:
long = {"Yangon": 16.8661, "Naypyitaw": 19.7633, "Mandalay": 21.9588 }
lat = {"Yangon": 96.1951, "Naypyitaw": 96.0785, "Mandalay": 96.0891 }
for set in sales:
sales['long'] = sales['City'].map(long)
sales['lat'] = sales['City'].map(lat)
In [ ]:
sns.scatterplot(x="long", y = "lat",size = "Total",style = "Product line", data =sales, legend = "brief").set_title("Customer Type")
In [ ]:
sns.relplot(x="Total", y = "Quantity", data =sales).set_title("Customer Type")
In [ ]: