Walmart Sales Analysis SQL Project

About Project

Zainab Mehmood
9 min readSep 8, 2023

The goal of this project is to examine Walmart Sales data in order to comprehend top-performing branches and items, sales trends for various products, and consumer behavior. The goal is to investigate how sales methods might be made better and more effective. The [Kaggle Walmart Sales Forecasting Competition](https://www.kaggle.com/c/walmart-recruiting-store-sales-forecasting) is where the dataset was collected.

“In this hiring contest, job hopefuls are given historical sales information for 45 Walmart locations spread across several geographies. Participants must forecast the sales for each department in each store, which has multiple departments. Selected holiday markdown events are included in the dataset to increase the difficulty. Although it is recognized that these markdowns have an influence on sales, it can be difficult to forecast which departments will be impacted and how much.

Purpose

The major aim of thie project is to gain insight into the sales data of Walmart to understand the different factors that affect sales of the different branches.

About Data

This dataset contains sales transactions from a three different branches of Walmart, respectively located in Mandalay, Yangon and Naypyitaw. The data contains 17 columns and 1000 rows:

Data Wrangling

This is the first step where inspection of data is done to make sure **NULL** values and missing values are detected and data replacement methods are used to replace, missing or **NULL** values.

  1. Build a database
create database  if not exists  SalesDataWalmart;

2. Create table and insert the data.

3. Select columns with null values in them. There are no null values in our database as in creating the tables, we set **NOT NULL** for each field, hence null values are filtered out.


Create table if not exists WMsales (
invoice_id varchar(30) Not null primary Key,
branch varchar (5) not null,
city varchar (30) not null,
customer_type varchar (30) not null,
gender varchar(10) not null,
product_line varchar (100) not null,
unit_price decimal(10,2) not null,
quantity int not null,
VAT float(6,4) not null,
total decimal (12,4) not null,
date datetime not null,
time TIME not null,
payment_method varchar (15) not null,
cogs decimal (10, 2) not null,
gross_margin_pct float (11,9),
gross_income decimal(12, 4) not null,
rating float (2, 1)
);

Import Data from CSV

After creating a database, it’s time to import data from a CSV file. For this select import and a new window will open Select the path from the local machine.

A snippet of the table view

Feature Engineering

  1. Add a new column named `time_of_day` to give insight of sales in the Morning, Afternoon and Evening. This will help answer the question on which part of the day most sales are made.
select 
time,
(case
when `time` between "00:00:00" and "12:00:00" then "Morning"
when `time` between "12:01:00" and "16:00:00" then "Afternoon"
else "Evening"
end
) As time_of_day
from wmsales;
alter table wmsales add column time_of_day varchar(30);
update wmsales
set time_of_day= (
case
when `time` between "00:00:00" and "12:00:00" then "Morning"
when `time` between "12:01:00" and "16:00:00" then "Afternoon"
else "Evening"
end
);

2. Add a new column named `day_name` that contains the extracted days of the week on which the given transaction took place (Mon, Tue, Wed, Thur, Fri). This will help answer the question on which week of the day each branch is busiest.

--- -day name
select
date,
dayname(date)
from wmsales;
alter table wmsales add column day_name varchar(10);
update wmsales
set day_name= dayname(date);

3. Add a new column named `month_name` that contains the extracted months of the year on which the given transaction took place (Jan, Feb, Mar). Help determine which month of the year has the most sales and profit.

-----------  Month_name
select
date,
monthname(date)
from wmsales;
alter table wmsales add column month_name varchar(15);
update wmsales
set month_name= monthname(date);

An updated data set with day_name, month_name and time_of_day

Exploratory Data Analysis (EDA)

Exploratory data analysis is done to answer the listed questions and aims of this project.

Business Questions To Answer

Generic Question

  1. How many unique cities does the data have?
  2. In which city is each branch?
select
distinct city
from wmsales;
-- 2. In which city is each branch?
select
distinct city,
branch
from wmsales;
Distinct City
Distinct Branches in each city

Product Analysis

1. How many unique product lines does the data have?

--------------- 1. How many unique product lines does the data have?-----------
select
count(distinct product_line)
from wmsales;
No of Product lines

2. What is the most common payment method?

3. What is the most selling product line?

4. What is the total revenue by month?

------------------------- 2. What is the most common payment method?-----------
select
payment_method,
count( payment_method) as cnt
from wmsales
group by payment_method
order by cnt desc;
----------- 3. What is the most selling product line?---------------
select
product_line,
count( product_line) as cnt
from wmsales
group by product_line
order by cnt desc;
--------------- 4. What is the total revenue by month? ------------
Select
month_name as Month,
sum(total) as Total_Revenue
from wmsales
group by month_name
order by Total_Revenue Desc ;

5. What month had the largest COGS?

6. What product line had the largest revenue?

7. What is the city with the largest revenue?

8. What product line had the largest VAT?

----------------------- 5. What month had the largest COGS?----------
select
month_name as month,
sum(cogs) as cogs
from wmsales
group by month_name
order by cogs desc;
---------------- 6. What product line had the largest revenue? ---------------
select
product_line,
sum(total) as Total_revenue
from wmsales
group by product_line
order by Total_revenue desc;
---------------------- 7. What is the city with the largest revenue?-----------------
select
city,
sum(total) as Total_revenue
from wmsales
group by city
order by Total_revenue desc;
-------------- -8. What product line had the largest VAT?
select
product_line,
sum(VAT) as Valueable_Tax
from wmsales
group by product_line
order by Valueable_Tax desc;

10. Which branch sold more products than average product sold?

11. What is the most common product line by gender?

12. What is the average rating of each product line?

---- 10. Which branch sold more products than average product sold?-----
select
branch ,
sum(quantity) as qty
from wmsales
group by branch
having sum(quantity) >
(select avg(quantity) from wmsales);
------------ 11. What is the most common product line by gender?-------
select
gender,
product_line,
count(gender) as total_count
from wmsales
group by gender, product_line
order by total_count desc;

------- 12. What is the average rating of each product line?-----------
select
round(avg (rating) , 2) as avg_rating,
product_line
from wmsales
group by product_line
order by avg_rating desc;

Sales Analysis

1. Number of sales made in each time of the day per weekday

2. Which of the customer types brings the most revenue?

3. Which city has the largest tax percent/ VAT (**Value Added Tax**)?

4. Which customer type pays the most in VAT?

-------- -----1. Number of sales made in each time of the day per weekday--------
select
time_of_day,
count(*) as total_sales
from wmsales
where day_name = "Sunday"
group by time_of_day
order by total_sales desc;
Number of sales made each day
----------- 2. Which of the customer types brings the most revenue?---
select
customer_type,
round (sum(total), 2) as total_revenue
from wmsales
group by customer_type
order by total_revenue;
The customer brings the most revenue
--------------------------- 3. Which city has the largest tax percent/ VAT (**Value Added Tax**)?--
select
city,
avg(VAT) as value_added_tax
from wmsales
group by city
order by value_added_tax desc;
City Contribution in VAT
--------------------- 4. Which customer type pays the most in VAT?-----------
select
customer_type,
avg(VAT) as value_added_tax
from wmsales
group by customer_type
order by value_added_tax desc;
Customers contributing in VAT

Customer Analysis

  1. How many unique customer types does the data have?
------------- 1. How many unique customer types does the data have?----------
select
distinct (customer_type)
from wmsales;

2. How many unique payment methods does the data have?

----------- . How many unique payment methods does the data have?-----
select
distinct (payment_method)
from wmsales;

3. What is the most common customer type?

-------------------------- 3. What is the most common customer type?--
select
customer_type,
count(*) as total_count
from wmsales
group by customer_type
order by total_count;

4. Which customer type buys the most?

------------ 4. Which customer type buys the most?----------
select
customer_type,
count(*) as total_count
from wmsales
group by customer_type
order by total_count;

5. What is the gender of most of the customers?

-------------------------- 5. What is the gender of most of the customers?------
select
gender,
count(*) as gender_count
from wmsales
group by gender
order by gender_count desc;

6. What is the gender distribution per branch?

------------------ 6. What is the gender distribution per branch?---------
select
gender,
count(*) as gender_count
from wmsales
where branch = "C"
group by gender
order by gender_count desc;

7. Which time of the day do customers give most ratings?

-------------- 7. Which time of the day do customers give most ratings?-----
select
time_of_day,
avg(rating) as avg_rating
from wmsales
group by time_of_day
order by avg_rating desc;

8. Which time of the day do customers give most ratings per branch?

------------ 8. Which time of the day do customers give most ratings per branch?----
select
time_of_day,
branch,
avg(rating) as avg_rating
from wmsales
group by time_of_day, branch
order by avg_rating ;


select
time_of_day,
avg(rating) as avg_rating
from wmsales
where branch = "C"
group by time_of_day
order by avg_rating ;

9. Which day fo the week has the best avg ratings?

---------------------- 9. Which day fo the week has the best avg ratings?--------------
select
day_name,
avg(rating) as avg_rating
from wmsales
group by day_name
order by avg_rating desc;

10. Which day of the week has the best average ratings per branch?

----- 10. Which day of the week has the best average ratings per branch?--
select
day_name,
avg(rating) as avg_rating
from wmsales
where branch= "A"
group by day_name
order by avg_rating desc;

Revenue And Profit Calculations

------ $ total(gross_sales) 
select
sum(VAT+cogs) as total_grass_sales
from wmsales;
----- gross profit-----------
SELECT
(SUM(VAT + COGS) - COGS)
FROM wmsales;

SELECT (SUM(ROUND(VAT, 2) + COGS) - COGS) FROM wmsales;

Conclusion

In this data analysis project, we embarked on a journey to gain valuable insights from Walmart’s sales data. We began by preparing and exploring the dataset, cleaning any missing or null values, and engineering new features to help us uncover meaningful patterns.

We addressed a variety of questions, ranging from understanding product performance and sales trends to diving into customer behavior. These insights are crucial for Walmart’s sales strategies and can guide future optimizations.

Throughout our analysis, we made use of SQL queries to extract relevant information from the dataset. We also computed important metrics such as COGS (Cost of Goods Sold), VAT (Value Added Tax), total revenue, and gross profit to better understand the financial aspects of Walmart’s operations.

Some key takeaways from our analysis include:

  • Identification of top-performing product lines and branches.
  • Analysis of sales trends, which can inform sales strategies and modifications.
  • Profiling of customer segments and their profitability.

The data-driven decisions made possible through this analysis can contribute to improved sales strategies, inventory management, and overall business performance.

While this project has provided valuable insights, it’s important to note that data analysis is an ongoing process. As Walmart continues to collect sales data, the potential for further analysis and refinement of strategies remains. This project serves as a foundation for future endeavors in enhancing Walmart’s sales forecasting and optimizing its operations.

Feel free to include any specific findings or results from your analysis in this section to provide a more detailed conclusion. Additionally, you can summarize the impact of your analysis on Walmart’s business strategies and operations.

Github Link for source code file

--

--

Zainab Mehmood

Data Analyst | Power BI | Python, Excel Data-driven Decision Making