Understanding Fact and Dimension Tables in Data Warehousing: A Real-World Example

Ankush Singh
3 min readMay 30, 2023

--

Data warehousing, as a critical component of Business Intelligence (BI), is a specialized system used for data analysis and reporting. Among the many concepts you will encounter in data warehousing, two stand out — Fact tables and Dimension tables. These tables form the backbone of the star and snowflake schemas, the two most common structures used in data warehouses.

In this blog post, we’ll walk you through what Fact and Dimension tables are, their functions, and how they interact with each other. We’ll use a real-world example to ensure the concepts are as clear as possible.

What is a Dimension Table?

In a data warehouse, a Dimension table is a structure that categorizes facts and measures in order to enable users to answer business questions. Dimensions are descriptive and define the characteristics of a business object. They provide context to facts — as they hold the fields which are descriptive, qualitative and textual.

Let’s take a simple example of an e-commerce business. In this case, some dimensions could be Customers, Products, and Time.

  • The Customer dimension may have attributes like CustomerID, Name, Email, and Address.
  • The Product dimension may have ProductID, Name, Category, and Price.
  • The Time dimension may have Date, Month, Quarter, and Year.

What is a Fact Table?

Contrary to the descriptive nature of Dimension tables, a Fact table is a primary table in a dimensional model. A Fact table contains the quantifiable data for analysis — the numerical measures (often additive) of the business processes. The Fact table also has foreign keys which refer to candidate keys in the Dimension tables.

Going back to our e-commerce example, the Fact table could be Sales, and could include:

  • Quantity_sold (a measure)
  • Total_sales (a measure)
  • ProductID (a foreign key related to the Product dimension)
  • CustomerID (a foreign key related to the Customer dimension)
  • Date (a foreign key related to the Time dimension)

In this scenario, each sale would be recorded in the Fact table as a separate row, providing the opportunity for detailed and complex analysis.

Relationship between Fact and Dimension Tables

The relationship between Fact and Dimension tables is defined by a schema. In the star schema, the most common approach, a single Fact table sits in the middle, and is related to numerous Dimension tables that branch out like a star. The Fact table connects the dots between different Dimension tables, and this connection is what enables complex data analysis.

Imagine a business question from our e-commerce scenario: “What is the total sales for the ‘Electronics’ category in the first quarter of 2023 by customer?”

Here’s how the data warehouse using Fact and Dimension tables can help answer it:

  1. The Fact table ‘Sales’ keeps records of every sale made, including the quantity sold, total sales value, and the keys related to time, product, and customer.
  2. The Product Dimension table provides information about the product categories.
  3. The Time Dimension table allows you to filter the sales made in the first quarter of 2023.
  4. The Customer Dimension provides information about individual customers.

By combining information from the Sales Fact table and related entries in the Dimension tables, the data warehouse can efficiently answer the question.

Conclusion

Dimension and Fact tables are critical elements of a data warehouse, serving as the structure for data analysis and decision-making. While Dimension tables provide the descriptive context, Fact tables store the measurable transactions. Together, they empower businesses to answer complex questions, track changes over time, and make data-driven decisions.

Understanding these concepts is the first step to effectively using data warehousing to leverage your business data. As more businesses rely on data analysis for decision making, a strong foundation in these principles can provide a significant edge in the business world.

--

--

Ankush Singh

Data Engineer turning raw data into gold. Python, SQL and Spark enthusiast. Expert in ETL and data pipelines. Making data work for you. Freelancer & Consultant