Superstore Sales Analysis Demonstration

Michel Nguegang
20 min readMar 1, 2023

--

Identifying the patterns in our products, regions, categories and customer segments for efficiency and profit optimization with SQL

With the constant advancement of technology and innovation, having clear cut databases and possessing departments to breathe life into interpreting them will set you in the top realm of efficiency. Execution is important but we need great inputs to yield even better outputs. There is a necessity to catch up with the growing demand and fierce competition in the market . That is why we need leverage insights to provide more value to the customer as value is the key to everything.

Our analysis in question will be carried with Excel, SQL and finally Tableau. Excel will serve as first repository for our data, SQL will give meaning to our data and Tableau will give a clear face to our data. The following sales performance analysis will follow the 6 steps of Data Analysis which are: Ask, Prepare, Process, Analyze, Share and Act.

Step 1: Ask

In this step, we will define the business problem given to us which was interpreted as “What are the best products, regions, categories and customer segments for the Superstore to target or avoid in order to increase profitability?”

Business objectives:

  • How can we optimize our profits?
  • What are the emerging trends that we can we identify?
  • How can we take these insights to build recommendations ?

Deliverables:

  • A clear summary of the business objectives.
  • A full documentation of all the data cleaning, manipulation and analysis.
  • A dashboard with visualizations and main outcomes.
  • Recommendations based on our insights and analysis.

Step 2: Prepare

In this phase, we will identify and assess the features of our Superstore Dataset:

  • The data is publicly available through Kaggle under https://www.kaggle.com/datasets/vivek468/superstore-dataset-final.
  • It comes with 9995 rows with 9994 being pure data and the other one row being the column headers. It contains data recorded between the 3rd of January 2014 (the first order date) to the 5th of January 2018 (the last shipping date). (The last order date is the 30th of December 2017, so we will instead use the order dates range to represent our 4 years of business)
  • It contains the data of 793 customers.
  • The data contains the 21 columns namely; Row ID, Order ID, Order Date, Ship Date, Ship Mode, Customer ID , Customer Name, Segment, Postal Code, City, State, Country, Region, Product ID, Category, Sub-Category, Product Name, Sales, Quantity, Discount and Profit
  • The only limitations of our dataset that I could mention is that the most recent data point was almost 6 years ago. So our data is not current. However, our data is quite reliable, original, comprehensive and is cited.

Moving on to the data processing, we will use Excel for cleaning

Step 3: Process

We will process and clean our data with the help of Excel as the file is already a CSV file so a look through of our data with Excel can be ideal to:

  • Observe our data
  • Check for missing data with the help of conditional formating
  • Remove duplicate rows
  • Correctly format columns for easy SQL analysis

While exploring our dataset, we can perform and notice the following;

  • Our data looks correct and consistent. Everything looks well structured for further analysis it just needs a little editing.
  • With the use of conditional formating, which is a technique to highlight certain values of interest, we set our new formatting rule to be ‘Format only cells that contain’. Then we added on it to be ‘‘Format only cells with’: Blanks’. Our color of choice was yellow. Then we zoomed out to have a birds eye view of our dataset. We can confirm that our dataset contains no missing values. Another method would be to go to the ‘Data’ section of Excel and hit filters and filter rows for blanks.
  • With the command ‘remove duplicates’, there was no instance where the data was duplicated with all the exact parameters for a customer in all columns. So all the rows had some variety to it hence returning no duplicate data for our data set.
  • Finally, made sure Order Date and Ship Date were well formatted by formating it to dates (which they already were) and formatted the Sales and Profits columns from numbers to currencies since we our talking about financials. The discount column was not formated to currency as in this context, the discount is more of a percentage value. So discount will later be formatted from number to percentage by just multiplying it by 100.

Now our dataset is ideal for analysis to discover relationships, trends and patterns that will give us a competitive edge and completely solve our business objectives.

Step 4: Analyze

For the analysis part, we will string out the most important components of our data to answer our business objectives.

Let’s load our data into SQL and check the first 5 rows to make sure it imported well.

SELECT *
FROM superstore
LIMIT 5;
The first 5 rows and 7 columns of our dataset

Okay, let’s perform an exploratory data analysis with our input on the superstore dataset. A list of tasks will be answered followed by the query input and query result. At the end of our analysis, we will transition to a dashboard reflecting the answers to the most important components that will solve the business problem posed upon us.

  1. What are total sales and total profits of each year?

The years were grouped by order date, so we can observe data for the year 2014, 2015, 2016 and 2017.

SELECT DATE_TRUNC('year', orderdate) AS year, 
SUM(sales) AS total_sales,
SUM(profit) AS total_profit
FROM superstore
GROUP BY year
ORDER BY year ASC;

This query produced the following result:

Total sales and Total profits for each year

The data above shows how the profits over the years have steadily increased with each year being more profitable than the other despite having a fall in sales in 2015, our financial performance

2. What are the total profits and total sales per quarter?

This is done to see the periods where our company has been the most impactful. So that in the future, we can tailor our operations where we see fit like maximizing our resources like advertisement, customer service and our overall presence during those times of the year. This is solved with the code below;

SELECT 
date_part('year', orderdate) AS year,
CASE
WHEN date_part('month', orderdate) IN (1,2,3) THEN 'Q1'
WHEN date_part('month', orderdate) IN (4,5,6) THEN 'Q2'
WHEN date_part('month', orderdate) IN (7,8,9) THEN 'Q3'
ELSE 'Q4'
END AS quarter,
SUM(sales) AS total_sales,
SUM(profit) AS total_profit
FROM superstore
GROUP BY year, quarter
ORDER BY year, quarter;

The code above produces the following:

The total sales and total profits for each year per quarter

Now this table will aid us in knowing what quarters were the most profitable to us from 2014–2017. This can help to pave the way for investment and marketing strategies. Computing the table above through Excel gives us the following :

Most performing quarters from 2014–2017

The data above shows that the period of October, November and December are our best selling months and our months where we bring in the most profit. Just by seeing this table, we can develop operation strategies pretty nicely as there is a clear buildup like a stock market rally from January to December then it dumps around the first 3 months. Let’s get into the regions.

3. What region generates the highest sales and profits ?

This is solved with the code below:

SELECT region, SUM(sales) AS total_sales, SUM(profit) AS total_profits
FROM superstore
GROUP BY region
ORDER BY total_profits DESC;
Total Profits and Sales by Region

We can observe above that the West region is the one with the most sales and brings us in the highest profits. The East region is pretty good looking for our company too. Those 2 regions are definitely areas of interest if we want to maximize our profits and expand our business. Concerning the South region, we do not gain a lot of revenue but still the profits are there. It is the Central region that is quite alarming as we generate way more revenue than the South region but do not make at least the same profits over there. The Central region should be on our watchlist as we could start to think on how we could maybe put our resources in the other regions instead. Let’s observe each regions profit margins for further analysis with the following code:

SELECT region, ROUND((SUM(profit) / SUM(sales)) * 100, 2) as profit_margin
FROM superstore
GROUP BY region
ORDER BY profit_margin DESC
Profit margins by region

Profit margins are a measure of a company’s profitability and are expressed as the percentage of revenue that the company keeps as profit. So we can see that the West and East are really good. The South region despite almost selling less than half of the West region in revenue has a good profit margin of 11.93% which is great. However the Central region is still not convincing. Let’s move on and try to pinpoint the data in each region.

4. What state and city brings in the highest sales and profits ?

States

Firstly, Let’s discover what states are the top 10 highest and lowest and then we will move on to the cities. For the states, it can be found with the following code:

SELECT State, SUM(Sales) as Total_Sales, SUM(Profit) as Total_Profits, ROUND((SUM(profit) / SUM(sales)) * 100, 2) as profit_margin
FROM superstore
GROUP BY State
ORDER BY Total_Profits DESC
LIMIT 10;

This produces the following results:

Top 10 State’s total sales and profits with their profit margins

The decision was to include profit margins to see this under a different lens. The data shows the top 10 most profitable states. Besides we can see the total sales and profit margins. Profit margins are important and it allows us to mostly think long-term as an investor to see potential big markets. In terms of profits, California, New York and Washington are our most profitable markets and most present ones especially in terms of sales. Which, are so high that it would take so much for the profit margins to be higher. However the profits are great and the total sales show that we have the best part of our business share at those points so we need to boost our resources and customer service in those top states.

Let’s observe our bottom 10 States:

SELECT State, SUM(Sales) as Total_Sales, SUM(Profit) as Total_Profits
FROM superstore
GROUP BY State
ORDER BY Total_Profits ASC
LIMIT 10;
Bottom 10 State’s total sales and profits

Our least profitable markets are listed above. The top 3 are Texas, Ohio and Pennsylvania. Texas and Pennsylvania are especially alarming as they have more than 100,000 in sales with Texas having more sales than Washington (which made $33402.70 in profits) but made a loss of $25729.29.

Cities

The top cities are found with the code below:

SELECT City, SUM(Sales) as Total_Sales, SUM(Profit) as Total_Profits, ROUND((SUM(profit) / SUM(sales)) * 100, 2) as profit_margin
FROM superstore
GROUP BY City
ORDER BY Total_Profits DESC
LIMIT 10;
Top 10 Cities’ total sales and profits with their profit margins

The top 3 cities that we should focus on are New York City, Los Angeles and Seattle.

The bottom 10 cities are:

SELECT City, SUM(Sales) as Total_Sales, SUM(Profit) as Total_Profits
FROM superstore
GROUP BY City
ORDER BY Total_Profits ASC
LIMIT 10;
Bottom 10 Cities’ total sales and profits with their profit margins

The bottom 3 are Philadelphia, Houston and San Antonio. We have 2 cities from Texas in our top 3 so it is clear that we have start redesigning some strategies and how we operate in those cities.

5. The relationship between discount and sales and the total discount per category

First, let’s observe the correlation between discount and average sales to understand how impactful one is to the other.

SELECT Discount, AVG(Sales) AS Avg_Sales
FROM superstore
GROUP BY Discount
ORDER BY Discount;

This produces the following:

Discount vs Avg Sales

Seems that for each discount point, the average sales seem to vary a lot. Let’s check the correlation with a graph in Excel

Discount vs Avg Sales correlation graph

They almost have no linear relationship. This noted by the correlation coefficient of -0.3 and the shape of the graph. However we can at least observe that at a 50% discount, (0.5 * 100 to convert it to percentage) our average sales are the highest it can be. Maybe it is a psychology technique or it’s just the right product category that is discounted.

Let’s observe the total discount per product category:

SELECT category, SUM(discount) AS total_discount
FROM superstore
GROUP BY category
ORDER BY total_discount DESC;
Most discounted Categories

So Office supplies are the most discounted items followed by Furniture and Technology. We will later dive in into how much profit and sales each generate. Before that, let’s zoom in the category section to see exactly what type of products are the most discounted.

SELECT category, subcategory, SUM(discount) AS total_discount
FROM superstore
GROUP BY category, subcategory
ORDER BY total_discount DESC;
Most discounted subcategories (product type)

Binders, Phones and Furnishings are the most discounted items. But the gap between binders and the others are drastic. We should check the sales and profits for the binders and other items on the list. But first let’s move on to the categories per state.

6. What category generates the highest sales and profits in each region and state ?

First, let’s observe the total sales and total profits of each category with their profit margins:

SELECT category, SUM(sales) AS total_sales, SUM(profit) AS total_profit, ROUND(SUM(profit)/SUM(sales)*100, 2) AS profit_margin
FROM superstore
GROUP BY category
ORDER BY total_profit DESC;
Categories with their total sales, total profits and profit margins

Out of the 3, it is clear that Technology and Office Supplies are the best in terms of profits. Plus they seem like a good investment because of their profit margins. Furnitures are still making profits but do not convert well in overall. Let’s observe the highest total sales and total profits per Category in each region:

SELECT region, category, SUM(sales) AS total_sales, SUM(profit) AS total_profit
FROM superstore
GROUP BY region, category
ORDER BY total_profit DESC;
Highest total sales and profits per Category in each region

These our are best categories in terms of total profits in each region. The West is in our top 3 two times with Office Supplies and Technology and the East with Technology. Among the total profits, the only one that fails to break even is the Central Region with Furniture where we operate at a loss when selling it there.

Now let’s see the highest total sales and total profits per Category in each state:

SELECT state, category, SUM(sales) AS total_sales, SUM(profit) AS total_profit
FROM superstore
GROUP BY state, category
ORDER BY total_profit DESC;
Top Highest total sales and profits per Category in each state

The table above shows the most performing categories in each of our states. Technology in New York and Washington and Office Supplies in California. The 3 categories are all around good for our top 3 markets except the furniture category in Washington which is good but not as great as the others. Let’s check the least profitable ones by just changing our ‘ORDER BY’ clause too ascending (ASC) :

SELECT state, category, SUM(sales) AS total_sales, SUM(profit) AS total_profit
FROM superstore
GROUP BY state, category
ORDER BY total_profit ASC;
Top Lowest total sales and profits per Category in each state

Office supplies in Texas, Technology in Ohio and Furniture in Texas and Illinois are our biggest losses. Let’s move on to subcategories.

7. What subcategory generates the highest sales and profits in each region and state ?

Let’s observe the total sales and total profits of each subcategory with their profit margins:

SELECT subcategory, SUM(sales) AS total_sales, SUM(profit) AS total_profit, ROUND(SUM(profit)/SUM(sales)*100, 2) AS profit_margin
FROM superstore
GROUP BY subcategory
ORDER BY total_profit DESC;
Subcategories with their total sales, total profits and profit margins

Out of our 17 subcategories nationwide, our biggest profits comes from Copiers, Phones, Accessories and Paper. The profits and profit margins on Copiers and Papers especially are interesting for the long run. Our losses came from Tables, Bookcases and Supplies where we are uncapable of breaking even. Those 3 should be further reviewed as the sales are there, (except Supplies) but we cannot generate profits from them.

Now let’s see the highest total sales and total profits per subcategory in each region:

SELECT region, subcategory, SUM(sales) AS total_sales, SUM(profit) AS total_profit
FROM superstore
GROUP BY region, subcategory
ORDER BY total_profit DESC;
Top 15 Subcategories with the highest total sales and total profits in each region

These above display the best subcategories per region.

Now let’s see the least performing ones:

SELECT region, subcategory, SUM(sales) AS total_sales, SUM(profit) AS total_profit
FROM superstore
GROUP BY region, subcategory
ORDER BY total_profit ASC;
Top 15 Subcategories with the lowest total sales and total profits in each region

We are unable to break-even with 14 subcategories. Tables and Furnishings are our biggest losses in profits in the East, South and Central region.

Now let’s see the highest total sales and total profits per subcategory in each state:

SELECT state, subcategory, SUM(sales) AS total_sales, SUM(profit) AS total_profit
FROM superstore
GROUP BY state, subcategory
ORDER BY total_profit DESC;
Top 15 Highest total sales and profits per Subcategory in each state

Machines, Phones and Binders perform very well in New York. Followed by Accessories and Binders in California and Michigan respectively.

Let’s see the lowest sales and profits. Still in order for biggest lost in profits.

SELECT state, subcategory, SUM(sales) AS total_sales, SUM(profit) AS total_profit
FROM superstore
GROUP BY state, subcategory
ORDER BY total_profit ASC;
Top 15 Highest total sales and profits per Subcategory in each state

Binders are our biggest losses in Texas and Illnois. Machines are not profitable in Ohio at all. We should observe and rethink our strategies in those areas.

8. What are the names of the products that are the most and least profitable to us?

Let’s verify this information:

SELECT productname, SUM(sales) AS total_sales, SUM(profit) AS total_profit
FROM superstore
GROUP BY productname
ORDER BY total_profit DESC;
Top 15 most profitable products

These Copiers, Machines and Printers are definetly the main foundations of our profits. The Canon imageClass 2200 Advanced Copier, Fellowes PB500 Electric Punch Plastic Comb Binding Machine with Manual Bind and the Hewlett Packard LaserJet 3310 Copier are our top 3. We should keep up the stock with these. Let’s verify our less proftable ones:

SELECT productname, SUM(sales) AS total_sales, SUM(profit) AS total_profit
FROM superstore
GROUP BY productname
ORDER BY total_profit ASC;
Top 15 less profitable products

The Cubify CubeX 3D Printer Double Head Print, Lexmark MX611dhe Monochrome Laser Printer and the Cubify CubeX 3D Printer Triple Head Print are the products that operate the most at a loss. We should take this into account if we are thinking about modifying our stock.

9. What segment makes the most of our profits and sales ?

This can be verified with the help of the following query:

SELECT segment, SUM(sales) AS total_sales, SUM(profit) AS total_profit
FROM superstore
GROUP BY segment
ORDER BY total_profit DESC;
Goods Segment ordered by total profits

The consumer segment brings in the most profit followed by Corporate and then Home office. Let’s move on.

10. How many customers do we have (unique customer IDs) in total and how much per region and state?

This can be solved with the following;

SELECT COUNT(DISTINCT customerid) AS total_customers
FROM superstore;
Total number of customers

We’ve had 793 customers between 2014 and 2017. Regionally, we had the following:

SELECT region, COUNT(DISTINCT customerid) AS total_customers
FROM superstore
GROUP BY region
ORDER BY total_customers DESC;
Total customers per region

We surely had customers moving around regions which explains why they all do not add up to 793. Since there could be double counting. The West is the area where we have the biggest market of all. Statewise, here are the numbers:

Top 15 states with the most customers

We have the most customers in California, New York and Texas. The areas where we have the least that passed by there are:

SELECT state, COUNT(DISTINCT customerid) AS total_customers
FROM superstore
GROUP BY state
ORDER BY total_customers ASC;
Top 15 states with the least customers

Wyoming, North Dakota and West Virginia are the places where we had the least customers carry on business with us there.

11. Customer rewards program

Let’s say we want to build a loyalty and rewards program in the future. What customers spent the most with us? That is generated the most sales. It is always important to cater for our best customers and see how we can provide more value to them as it its cheaper to keep a current customer than it is to acquire a new one. We will also check the total profits just fo further analysis. We can find out what we are looking for with the following query:

SELECT customerid, 
SUM(sales) AS total_sales,
SUM(profit) AS total_profit
FROM superstore
GROUP BY customerid
ORDER BY total_sales DESC
LIMIT 15;
Top 15 customers that generated the most sales compared to total profits.

The display of the customer names are on file but showing the unique Customer id is a form of pseudonymization for security reasons. What is actually interesting to see is that customer ID ‘SM-20320’ is the customer who spent the most with us but is not bringing us profit. We still have to reward his/her loyalty. It is customer ID ‘TC-20980’ who is second in the list but brings us the most profit. So we really have to thank our top customers and keep them on deck.

12. Average shipping time per class and in total

Finally, the average shipping time regardless of the shipping mode that is chosen is found with the following function:

SELECT ROUND(AVG(shipdate - orderdate),1) AS avg_shipping_time
FROM superstore
Average shipping time

The shipping time in each shipping mode is:

SELECT shipmode,ROUND(AVG(shipdate - orderdate),1) AS avg_shipping_time
FROM superstore
GROUP BY shipmode
ORDER BY avg_shipping_time
Average shipping time by shipping mode

Here we have all the information we need to transition to our Tableau dashboard. The data visualization is our following step.

Step 5: Share

Onto sharing our observations, below you will find a picture of the interactive dashboard that represents the main KPIs and information on the collected Superstore data which was realized and demonstrated with Tableau.

The link to the interactive dashboard will be found under the screenshot below. Here are our findings:

Superstore Interactive Dashboard Screenshot

This is an in depth analysis of the Superstore dataset. You will find 2 interactive dashboards focused on the yearly sales aspect and another one that is more quarterly sales oriented with both dashboards displaying data collected from 2014–2017.

If there is mainly an interest in the yearly (timeline) data on profits, categories, sub categories, segments and products, then below you will find the direct link to the dashboard:

https://10az.online.tableau.com/t/micheltableau/views/SalesDataAnalysis-YearlyFocus2/SalesDataAnalysis-YearlyFocusDashboard?:origin=card_share_link&:embed=n

If it is instead the quarterly (timeline) data that interests you, then the results are in this dashboard instead:

https://10az.online.tableau.com/t/micheltableau/views/SalesDataAnalysis-QuarterlyFocus/SalesDataAnalysis-QuartelyFocusDashboard?:origin=card_share_link&:embed=n

The 2 dashboards and its contents are in the main source file below:

https://10az.online.tableau.com/#/site/micheltableau/projects/338892?:origin=card_share_link

Finally, let’s establish suggestions for an effective plan of action to solve our business objectives.

Step 6: Act

With everything that was covered, here are our conclusions and future recommendations for the success of our Superstore:

  • Our profits got progressively better. Our sales too even with a short halt in 2015. We should keep the pace up on that aspect.
  • Our most profitable quarter all year round was Q4. To maximize even more profits, we must make sure to have enough stock and push our marketing and customer service to make the most out of the October — December festive period.
  • The most performing regions are the West then the East, South and Central regions in that order. The Central region brings in at least $100,000 more in sales than the South region but still makes less profits than it. There is work to be done in the Central region if we really want to keep that market. However, I believe it is better to take some of the resources in our Central region to instead our West region stores as we are more profitable there and could really establish ourselves as kingpin in that region.
  • California, New York and Washington are our most profitable markets and most present ones especially in terms of sales as states. We have to focus more on them. Our least profitable markets are Texas, Ohio, and Pennsylvania. Which I believe that we should decrease our presence there or even put a halt at our store locations there as sales in Texas and Pennsylvania are in the $100,000s but are unable to convert to profits.
  • New York City, Los Angeles and Seattle are our most profitable cities and we list them as being a top priority because it is easier to rule a city than ruling a state. If we gain the city, gaining the state will be less challenging. Philadelphia, Houston and San Antonio are the cities where we lose the most money. We have 2 cities from Texas in our top 3 cities so it is clear that we have start rethinking about really wanting to carry business there, the better option would be to stop.
  • Out of the 3 categories, Technology and Office Supplies are the best in terms of profits. Plus they seem like a good investment because of their profit margins. Furnitures are still making profits but do not convert well in overall. With low profits and low profit margins, we should start to see what more we can bring to the furniture department. The sales are there but they do not translate smoothly.
  • Still under categories but regionally, Office supplies in the West brings the most profit so we must increase the cap of those materials over there. Same thing with the East and office supplies and both the East and West with Technology. However, furniture in the Central region is the only category that doesn’t convert to profits so it would be better to take some of these resources to the West region which is the biggest gainer in terms of Furniture.
  • Statewise, Technology and Office supplies brings us the most profit in the state of New York and California. We have to increase the availability of these goods in these states for better profits. However, Office supplies in Texas, Technology in Ohio and Furniture in Texas and Illinois are our biggest losses so we have to drastically reduce these type of products in those areas.
  • Out of our 17 subcategories nationwide, our biggest profits comes from Copiers, Phones, Accessories and Paper. The profits and profit margins on Copiers and Papers especially are interesting for the long run. We should immediately push these products as we have a great market share with these items. Our losses came from Tables, Bookcases and Supplies where we are uncapable of breaking even. We must spend less time and money with them. Especially with tables because compared to our only 3 losses, Tables lost us $17725 which is huge compared to our other losses of $3472 and $1188 which came from Bookcases and Supplies respectively.
  • For subcategories regionally, Copiers in the West and East with Accessories and Binders in the West are products that we always have to have in stock and promote for more profit. While Tables in the East, South and Central with furnishings in the Central region are the top products where we lose money so we should direct our attention elsewhere.
  • In what concerns subcategories by state, Machines, Phones and Binders perform very well in New York. Followed by Accessories and Binders in California and Michigan respectively so there is a need to accentuate our business there with those products. For our biggest losses, Binders in Texas and Illinois with machines in Ohio are not profitable at all. We have to decrease stock in those places.
  • For the particular products, The Canon imageClass 2200 Advanced Copier, Fellowes PB500 Electric Punch Plastic Comb Binding Machine with Manual Bind and the Hewlett Packard LaserJet 3310 Copier are our top 3 in profits. We must always keep up the stock with these. For our losses, The Cubify CubeX 3D Printer Double Head Print, Lexmark MX611dhe Monochrome Laser Printer and the Cubify CubeX 3D Printer Triple Head Print are the products that operate the most at a loss. We should certainly discontinue those products.
  • Out of the 3 segments, The consumer segment brings in the most profit followed by Corporate and then Home office. We must give more importance to the consumer segment even if all the 3 are profitable.
  • Finally, for our clientele, we have 793 customers total, and we have the most customers in California, New York and Texas. The case of Texas is pretty ironic since it is also the state that losses us the most money. So we must take a critical decision about Texas first as we absolutely can’t break through now. California and New York are pretty obvious, we have to be outstanding and be the best of what there is to offer in our respective niche.

Thank you for your time and commitment to the lecture!

--

--