Retail Analysis with Walmart Data — Part-1

Dhruval Patel
CodeX
Published in
5 min readMay 6, 2022

Analyzing and Building Machine Learning model for 45 stores of Walmart

Photo by Marques Thomas @querysprout.com on Unsplash

It’s very difficult to predict the demand of any retail store as there are certain events and holidays which impact sales each day. We have sales data available for 45 stores of Walmart. In part-1 I will tell you step-by-step how to approach the project. You will be learning basic statistical tasks to perform to get insight and in part-2, you’ll be learning how to tackle unforeseen demands with the help of Machine Learning Algorithms. Let’s get started.

Dataset Description

Walmart runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of all, which are the Super Bowl, Labour Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks. Part of the challenge presented by this competition is modeling the effects of markdowns on these holiday weeks in the absence of complete/ideal historical data. Historical sales data for 45 Walmart stores located in different regions are available.

(1) Import required libraries and dataset—

Import required libraries and dataset

(2) Changing the data type of the ‘Date’ column —

We are changing the data type of the ‘Date’ column because it is an object type.

Changing the data type of the ‘Date’ column

Here, the dataset does not have any null values. So, we are ready to proceed with basic statistical tasks.

(3) Statistical Tasks —

A. Which store has maximum sales?

In order to find out the maximum sales, I will create a new variable called ‘total_sales’. Then group by stores and find the sum of the weekly sales of each store. This will give me the maximum sales. Store-20 has the maximum sales of $301,397,792. You can also find the minimum sales as well.

Maximum Sales

B. Which store has maximum standard deviation i.e., the sales vary a lot. Also, find out the coefficient of mean to standard deviation.

To find out the maximum standard deviation, create a new variable and then group it by stores and find the standard deviation. Store-14 has a maximum standard deviation = $317,569.949. Next question is to find the coefficient of mean to standard deviation, and the Coefficient of Mean to Standard Deviation = 15.71%.

Maximum Standard Deviation
Mean to Standard Deviation

C. Which store/s has a good quarterly growth rate in Q3’2012?

First, find the Q2 sales and then Q3 sales, take out the difference and then find the growth rate. No store has shown quarterly growth in quarter’3 2012.

Growth Rate in Q3'2012

D. Some holidays have a negative impact on sales. Find out holidays that have higher sales than the mean sales in the non-holiday season for all stores together.

We have 4 Holiday Events, (1) Super Bowl: 12-Feb-10, 11-Feb-11, 10-Feb-12, 8-Feb-13, (2) Labour Day: 10-Sep-10, 9-Sep-11, 7-Sep-12, 6-Sep-13, (3) Thanksgiving: 26-Nov-10, 25-Nov-11, 23-Nov-12, 29-Nov-13, (4) Christmas: 31-Dec-10, 30-Dec-11, 28-Dec-12, 27-Dec-13.

Now calculate the holiday event sales of each of the events and then find the non-holiday sales. I found that Thanksgiving has the highest sales ($1,471,273.43) than non-holiday sales ($1,041,256.38).

Holiday Event Sales
Non-holiday Sales and Comparison

E. Provide a monthly and semester view of sales in units and give insights.

Now, plot a month-wise bar graph for weekly sales to get an idea about which month has the maximum sales, and then I have plotted a year-wise bar graph for weekly sales to know which year has the highest weekly sales.

Monthwise Weekly Sales
Yearwise Weekly Sales

I have drawn some insights — (1) Year 2010 has the highest sales and 2012 has the lowest sales. (2) December month has the highest weekly sales. (3) Year 2011 has the highest weekly sales.

Voila, you’re done with the basic statistical tasks and I hope you like part-1 of retail analysis with Walmart data. In the next part, you will be learning how to build a statistical model to forecast the demand.

Find my Kaggle notebook here.

Thank you for reading! I would appreciate it if you follow me or share this article with someone. Best wishes.

Your support would be awesome❤️

--

--

Dhruval Patel
CodeX
Writer for

I write technical blogs explaining my Data Science project walkthroughs and the concepts relating to Data Science