Exploratory Data Analysis of a Global Superstore Dataset Using PowerBI

Oluwagbenga Ajetomobi
3 min readFeb 11, 2024

--

Introduction:

As part of a comprehensive data analysis project, I embarked on the exploration of a Global Superstore dataset from the year 2016. The dataset consisted of three key tables: order (51,290 records), people (24 records), and returns (1,079 records). Leveraging the capabilities of Power BI, I initiated the Extract, Transform, Load (ETL) process to uncover valuable insights from this complex dataset.

ETL Process:

The first step involved importing the three tables into Power BI, setting the stage for the subsequent transformations. Notably, I encountered a challenge with the column names “column1" and “column2" in the people and returns tables. To enhance clarity and ensure meaningful analysis, I used the “use first row as headers” option under the transform pane.

Addressing Data Quality Issues:

A critical aspect of the ETL process was addressing null values in the postal code field within the orders table. Recognizing the importance of accurate geographical information, I filled these null values by extracting data from the city, state, and country fields. This meticulous approach aimed to enhance the dataset’s completeness and reliability.

Creating Custom Columns for Temporal Analysis:

To facilitate temporal analysis, I introduced custom columns using Power BI’s “add column” feature. The first custom column, “delivery days”, was created by calculating the difference between the ship date and order date(=({ship date} — {order date}). Subsequently, I adjusted the data type to a whole number for consistency and ease of analysis. Additionally, a “year” custom column was introduced, capturing the year from the order date and providing a temporal dimension to the dataset(=(Date.Year([Order Date]))).

Handling Return Information:

Understanding the impact of returns on sales is crucial for business insights. I introduced a conditional column named “returned orders” based on the returned column. If the returned column equaled ‘Yes,’ the output was set to 1; otherwise, it was set to 0. The data type of this column was then changed from text to a whole number, aligning it with analytical requirements.

Loading Data for Visualization:

Having completed the transformations, I closed and applied the changes to load the refined dataset into Power BI for visualization. This crucial step paved the way for insightful and meaningful visual representations of the data.

Visualization Techniques:

To convey the findings effectively, I adopted a variety of visualization techniques:

  1. Slicer with Year: I created a slicer to allow dynamic filtering by year, enabling users to explore data trends over different periods.

2. Cards for Key Metrics: Utilizing cards, I presented key metrics such as total sales, total quantity, and average delivery days, providing a quick overview of the dataset’s essential characteristics.

3. Returned Orders Card: Employing the count function, I created a card specifically dedicated to displaying the total number of returned orders, shedding light on the impact of product returns on overall sales.

4. Regional Sales Map: Visualization of sales by region through a map facilitated a spatial understanding of sales distribution, aiding in strategic decision-making.

5. Profitable and Loss Products Bar Chart: To identify the top-performing and underperforming products, I visualized the top 6 profit and loss products using a bar chart, offering a clear comparison.

6. Sales Segmentation Pie Chart: The percentage of sales by segments was visually represented using a pie chart, providing insights into product category contributions to overall sales.

7. Market Distribution Donut Chart: A donut chart effectively communicated the percentage of sales by market, allowing for a comprehensive understanding of market dynamics.

8. Profit by Customer Bar Chart: To highlight the top 10 profitable customers, I utilized a bar chart, enabling a focused analysis of customer contributions to overall profitability.

The Visualization

Conclusion:

In conclusion, this data analysis project provided a comprehensive exploration of the Global Superstore 2016 dataset. Through meticulous ETL processes, handling data quality issues, and leveraging effective visualization techniques, the project aimed to uncover valuable insights for informed decision-making. The adoption of Power BI facilitated a seamless journey from raw data to actionable visualizations, emphasizing the importance of a robust data analysis approach in extracting meaningful business intelligence.

--

--