Analyzing Superstore Sales Data with Power BI: A Comprehensive Case Study

Shubham Bhosale
4 min readJun 9, 2024

--

Introduction

In today’s competitive retail environment, data-driven insights are crucial for managing operations and driving sales. In this case study, we explore how a superstore can leverage Power BI to gain valuable insights from sales data. This project entails data extraction, cleaning, transformation, modeling, and visualization to assist a new store manager in understanding and optimizing store performance.

Problem Statement:

Data Extraction, Cleaning, Loading and Transformation

1. Desk representatives at the stores are not tech savvy hence they directly share the data in the single excel file. As a Power BI Developer, read the data directly from the excel file.

2. The data coming from the source is in raw form in the flat file; hence clean and prepare (transform) the dataset for efficient use. Delete the empty columns and rows, change the fields to appropriate data types and split the fields and rename the columns appropriately.

3. Standardize the values in the column Ship mode.

4. Split the address column to City, State, Country and Pincode.

Step 1: Data Extraction and Preparation

Reading Data from Excel
First, we import the sales data from an Excel file into Power BI Desktop:
1. Open Power BI Desktop.
2. Click on `Home` > `Get Data` > `Excel` and select your dataset.

Cleaning the Data

In Power Query Editor, we perform the following steps:
1. Remove any empty columns and rows to ensure clean data.
2. Change data types of columns appropriately (e.g., dates, text, numbers).
3. Rename columns for clarity, making them more descriptive.

Standardizing Ship Mode Values

We standardize values in the Ship Mode column:
1. Select the Ship Mode column.
2. Use `Replace Values` to change abbreviations like “FC” to “First Class”.

Splitting the Address Column

If the address is in a single field, we split it into City, State, Country, and Pincode:
1. Select the address column.
2. Use `Split Column` feature to divide it into the required parts.

Step 2: Data Modeling

Creating the STAR Schema

We optimize the dataset by creating a STAR schema with a central Fact table (`Orders`) and three Dimension tables (`Order Details`, `Customer`, `Product`):

  • Orders (Fact Table): Contains Order ID, Order Date, Ship Date, Customer ID, Product ID, Quantity, Discount, Buy Price, Price Per Each, and Sales (new calculated column).
  • Order Details (Dimension): Includes Ship Mode, Postal Code, Region.
  • Customer (Dimension): Contains Customer ID, Customer Name, Segment.
  • Product (Dimension): Includes Product ID, Category, Sub-Category, Product Name.
STAR Schema

Removing Duplicates and Empty Rows. We ensure each dimension table has unique entries without empty rows.

Step 3: Data Analysis

Calculating Sales (Order Value)
We add a new column for Sales in the `Orders` table using DAX:

Sales = Orders[Quantity] * Orders[Price Per Each] * (1 - Orders[Discount])

Sales from Discounted Products
We create a measure to calculate sales from discounted products:

Discounted Sales = SUMX(FILTER(Orders, Orders[Discount] > 0), Orders[Sales])

Categorizing Cart Value

Since supermarkets sell bulk items, store managers want to know each order’s cart value. So we will create a column “Cart Value” that categorizes the order value/sales as Low, medium, high or very high using power query with the help of Conditional Column.

A pie chart visualizes the distribution of cart values.

Calculating Low Cart Value Sales

Low cart sales = CALCULATE(SUM(Orders[Sales]), FILTER(ALL(Orders[Cart Value]), Orders[Cart Value] = "Low"))

Lets track the total sales coming from the low cart category and discount more than or equal to 50% to find out the contribution and cause.

Low cart sales above 50% discount = CALCULATE(SUM(Orders[Sales]), FILTER(ALL(Orders[Cart Value]), Orders[Cart Value] = "Low"), ALL(Orders[Discount]), Orders[Discount] >= 0.50)

Calculating Delivery Days

We add a column to calculate the number of days taken for delivery. Lets look at the number of days it takes to deliver for each shipment type (refer ship mode) so that delivery issues can be looked at on priority. Use the table view.

Delivery Days = DATEDIFF(Orders[Order Date], Orders[Ship Date], DAY)

Superstore Sales Data Report

Conclusion

This Power BI project demonstrates how to transform raw sales data into actionable insights through careful data preparation, modeling, and visualization. By following these steps, the superstore can make informed decisions, optimize operations, and drive sales growth. This case study serves as a practical guide for anyone looking to leverage Power BI for retail data analysis.

--

--

Shubham Bhosale

Data Analyst | Skilled in SQL, Python, Power BI & Excel | Passionate About Turning Data into Insights