Super Store Sales Analysis

Analyse sales data of Super Store mart and identify opportunities to boost business growth

Katie Huang Xiemin
Analytics Vidhya
10 min readJul 9, 2021

--

Now you know I’m a Big Fan of Kim’s Convenience! 😉 | Image source: https://www.blogto.com/real-estate-toronto/2021/04/iconic-kims-convenience-store-for-sale-toronto/

Introduction

Super Store is a small retail business located in the United States. They sell Furniture, Office Supplies and Technology products and their customers are the mass Consumer, Corporate and Home Offices.

The data set contains sales, profit and geographical information of Super Store.

Our task is to analyse the sales data and identify weak areas and opportunities for Super Store to boost business growth.

Business Questions

  • Which Category is Best Selling and Most Profitable?
  • What are the Best Selling and Most Profitable Sub-Category?
  • Which is the Top Selling Sub-Category?
  • Which Customer Segment is Most Profitable?
  • Which is the Preferred Ship Mode?
  • Which Region is the Most Profitable?
  • Which City has the Highest Number of Sales?

Who is Reading this Analysis?

  • We assume that the Super Store is a family business and is owned by 1 or 2 owners who are very invested in their business.
  • Owners are likely not people who are expert in reading charts or interpreting statistical analysis, so our analysis will be in layman terms and easy to understand.

Preparing the Environment

We will import the required libraries and read in the data set.

  • Pandas — Data manipulation
  • Matplotlib and Seaborn — Data visualisation

Data Exploration

Let’s have a look at the data using df.head() and df.tail() function.

Preview first 5 rows of Super Store data set
Preview last 5 rows

You can easily tell that there is a mix of categorical, geographical and numerical variables.

Each row represents an order of an item and the corresponding quantity, sales, discount and profit. There is also the mode of shipment, customer segment and geographical aspects.

Next, we will use the df.shape() and df.info() to get more information.

There are 9,994 rows including header and 13 columns. Data type is correct and matches the corresponding values.

Data Cleaning

Before we start with the analysis, we must first clean the data or “scrub the dirt”.

For this analysis, we will look at the more common issues such as missing and duplicate data.

Handling Missing Values

Here, we will find out whether there is Null value.

The result shows that the data set does not have any null value.

Duplicate Data

Then, we will find out whether there is duplicate data.

Result shows that there are 17 duplicated rows. Let’s run another function to view the duplicated data.

Preview the 17 duplicated rows

We will then run the drop_duplicates() function to drop the duplicated rows.

Alright! We can confirm that the duplicated rows have been dropped as the rows were 9993 before and now it’s reduced to 9977 rows.

Calculated Field

Now, we will create a calculated field for Profit Margin. The formula is (Profit / Sales) * 100.

The new calculated field ‘Profit Margin %’ is added to the last column

Now that the data set has been scrubbed, we can proceed with some statistics analysis!

Descriptive Statistics

Here, we will do a descriptive statistical analysis. We use df.describe() and assign ‘include = ‘all’ to ensure that categorical features are also included in the output.

Whoops, not able to screenshot Profit Margin, but it’s here on the farthest right.

You will see ‘NaN’ in some of the categorical columns and that’s perfectly fine. Categorical values are not meant to have calculations performed on them so, we can ignore those.

What we want to focus is the unique count and frequency of the categorical features such as

  • There are 4 ship modes and 3 customer segments. Nearly half of the orders are from Consumer segment using Standard Class shipment.
  • The store carries 3 category of items with 17 sub-category. 60% of orders are for Office Supplies and at least 15% purchases are for Binders.

While for numerical parameters, it’s interesting to note that the

  • 75% of orders makes at least 36% profit margin.
  • Loss-making orders can go up to 275% losses. We must place emphasis on these loss-making sales to cut them off.

We will perform some analysis below to test these observations.

Exploratory Data Analysis

1. Which Category is Best Selling and Most Profitable?

Best Selling means looking at the Sales and Most Profitable is referring to the Profit. We will group the Sales, Profit and also Quantity by Category and run the df.group().sum and pd.DataFrame() functions.

Group by Sales, Profit and Quantity by Category
Bar plots of Sales, Profit and Quantity by Category

Our observations are:

  • All 3 categories — Furniture, Office Supplies and Technology make similar amount of sales.
  • Technology is Best Selling and it’s good to know that this category is the Most Profitable too. Only minimal quantity is sold as these products are usually one-off purchases that can last at least 4–5 years.
  • Although Furniture makes similar sales as Technology, it is the least profitable and quantity sold are at a minimum too.
  • Office Supplies sells the most in terms of quantity as it is relatively cheap product.

2. What are the Best Selling and Most Profitable Sub-Category?

Data frame of Sales and Profit by the Sub-Category in descending order
Data frame of Sales and Profit by the Sub-Category in ascending order
Bar plots showing the (L) Best Selling and (R) Most Profitable for Sub-Category

Let’s analyse the bar plots:

  • Phones and Chairs are Top 2 best selling sub-category.
  • Copiers produces most profit, followed by Phones, Accessories, Papers and Binders. The marketing strategy has to focus on marketing these products.
  • On the other end of the spectrum, Machines, Fasteners, Supplies, Bookcases and Tables make close to zero margin to losses. These are products that Super Store can consider dropping from the product catalogue or increase the sale price and profit margin or bargain for a lower price from the supplier.

3. Which is the Top Selling Sub-Category?

Quantity sold for each Sub-Category
Bar plot showing Top Selling for each Sub-Category

Here, we can deduce that

  • Super Store should ensure inventory are always well-stocked for the top selling sub-category such as Binders, Paper, Furnishings and Phones.
  • Despite being most profitable, Copiers sell the least only 234, but as it is a relatively expensive office equipment that is usually used for few years, it is understandable that it sells the least among all.

4. Which Customer Segment is Most Profitable?

Total Profit by Customer Segment
Plot showing Total Profit for each customer Segment

Consumer segment is most profitable, followed by Corporate Segment and Home Office. Hence, marketing strategy has to target or place more focus on retaining Consumer and Corporate Segment customers.

5. Which is the Preferred Ship Mode?

Plot showing no. of orders using different Ship Mode

By a landslide, Standard Class is the preferred method of shipment and perhaps the cheapest one too. The other modes are not popular among the customers and may be too costly.

6. Which Region is the Most Profitable?

Pie chart showing percentage of sales by Region

East and West region are most profitable.

7. Which City has the Highest Number of Sales?

Top 10 cities with highest sales
Bottom 10 cities with lowest sales
L: Top 10 cities with highest sales | R: Bottom 10 cities with lowest sales

There is a huge disparity between the cities with highest sales and lowest sales. Marketing strategy has to target the top 10 cities.

Strategic Recommendation

Here, we revisit our business questions and share with you our high-level business recommendations.

Business Questions

  • Which Category is Best Selling and Most Profitable?
  • What are the Best Selling and Most Profitable Sub-Category?
  • Which is the Top Selling Sub-Category?
  • Which Customer Segment is Most Profitable?
  • Which is the Preferred Ship Mode?
  • Which Region is the Most Profitable?

Recommendations

  • Focus on Technology sub-category and Phones and Chairs as they are highest selling and most profitable. Bundle them with the less profitable products such as Bookcases, Table and Chairs to offset the losses.
  • Selling Bookcases and Tables result in huge losses, so Super Store has to consider to bundle them together with High Selling or Profitable sub-category such as Chairs, Copiers, Phones and Office Supplies products.
  • For Home Offices customers, these people might be busy with work and less likely to spend time selecting individual products, so create a Home Office package with products used for offices such as table, chairs, phone, copiers, storage, label, fasteners, bookcases.
  • For loss-making products like Supplies, Bookcases, Tables, consider to either drop these from the catalogue or change suppliers and bargain for cheaper price.
  • Consumer and Corporate Segment make up more than 70% of customerbase. Target them, especially customers from the East and West region in the Top 10 cities with Highest Sales by introducing special promotions and bundles for mass Consumer and Home Offices and send promotional emails or flyers.

Thank you!

--

--

Katie Huang Xiemin
Analytics Vidhya

Self-taught Data Analyst | SQL Enthusiast | GitHub @katiehuangx