Super Store Sales Use-Case Data Analytics and Visualization

Swasti Khurana
Clique Community
Published in
5 min readJan 18, 2021

The Super Store dataset contains data on order details of customers for orders of a superstore in the US. This includes the state, region, order date, shipping date, product ordered etc. In this blog, we’ll define some use cases for this dataset. A use case for a dataset can be defined as questions that determine what information the dataset could represent. These can be a single question or multiple questions. Questions that we can ask the dataset and get answers from.

Let’s take a look at an example dataset and try to define its use cases. I’ll be taking the US Superstore dataset. Such datasets are huge, but we need to focus mainly on the column names (attributes) and 1 or 2 rows. I have used Python libraries pandas and numpy for analysis here.

Here, the top 5 rows are displayed to view the attributes (columns) and get a fair idea about the data and what attributes might provide meaningful information.

Let us have a look at the correlation between the numeric attributes:

High +ve correlation signifies a positive relation; as one quantity increases, it implies an increase in the second quantity. High negative correlation signifies that a decrease in one quantity would lead to increase in another. It is important to decide how we would like to use the correlations, whether positively or negatively as both provide notable results.

Looking at the table, we see that Discount and Profit have the highest negative correlation, and as a layman also we can deduce that high discounts mean less profit.

For more insight into positive correlation, let’s take into account quantity, profit and selling price as they seem the ones with higher relations and would be useful to us.

The correlation coefficient for Profit and Selling Price is high and that is also seen on the scatterplot. As Selling Price increases, Profit also is higher.

Now according to the attributes and data, we can list down a few use cases like:

  1. Trend in profit/sales over time (years/months/quarters).
  2. Trend in profit/sales over region (years/months/quarters).
  3. Product (Segment/Category) with highest and lowest sales.
  4. Forecasting future sales according to shipping date.

These are just examples of what can be done. We can also mix and match these to get more insights into our data.

But as we saw, even a simple scatterplot takes commands we need to remember. To avoid that, we can use Tableau which provides a simple drag and drop interface for visualisations. It is assumed that you’ll be familiar with the working of Tableau. If not, you can find a tutorial for getting started here.

Trend in profit/sales over time (years/months/quarters)

Here in this graph we can see the Sales over the months for each year, colour coded by year. On hovering over the points, we can see additional data and the exact figure of Sales. A broad look over this graph shows us that each year there is a significant dip in the month of October. This is something that can be investigated further. Also Sales increase over the months each year, then come down in the beginning of the year (January). Maybe a quarter wise split give more insight into this.

A quarter wise sales over the years shows us the usual dip in the beginning of the last quarter (October) but a gradual increase in each quarter with maximum sales in the 4th quarter.

2. Trend in profit/sales over region (years/months/quarters)

This histogram doesn’t provide a lot of insight. But it does tell us that the sales in California are rocketing. Maybe the store is ready to open another store there or maybe increase staff to handle the orders. In this case, a histogram is a best representation because the data is numeric over different states. A line graph would only provide data points and that is useful when there are multiple lines to compare with, as the case above for sales over time.

3. Product (Segment/Category) with highest and lowest sales

Sorted in ascending order according to the sales in each category, We understand that Phones did really well and so did Chairs. On a closer look, it is seen that Tables have the fourth highest sales. But what happens when we colour code it according to profit.

Now despite having the fourth largest sales, Tables actually are least profitable, in fact it is causing great losses. One possibility is that maybe the shipping cost is eating up all the profit. This needs to be investigated and fixed.

4. Forecasting future sales according to shipping date

Tableau uses exponential smoothing to generate a forecast based on the data in the field. Here the default view shows the window containing probability 95% and above in light blue colour. This can be changed to 99% or a custom value too. As our data is seasonal in nature, we see a dip in sales in January and also in October in the forecasted for the year 2017.

Compiling the Sales over the years in a single line graph provides more insight into predictions (increase or decrease) for the next year according to the seasonal trend.

Conclusion

We’ve visualised and analysed various use cases in the superstore dataset. We got some insightful results about the Profit and Sales that can be used to improve future policies. We also found a trend over the year so preparations in stores and warehouses for the next year can be made accordingly. We can now confidently pick up more datasets to define use cases for, and visualise them in Tableau.

--

--