Kimball Star Schemas in Data Warehousing: Part 2

Fact Tables Defined through Business Process Understanding

7 min readJul 21, 2024

In this second part of my mini-series covering Kimball modelling, the aim is to provide a guide on how data engineering teams can effectively collaborate with business stakeholders to deliver insights through the creation of fact tables in the data warehouse. Drawing from previous experience leading data teams, this guide explores best practices and methodologies required to ensure a seamless process.

If you missed the first part on dimensions, you can find it below:

Kimball Star Schemas in Data Warehousing: Part 1

Dimension Tables and Business Domains

medium.com

Understanding the Importance of Fact Tables

Fact tables are the glue that holds your data warehouse presentation layer together. They store quantitative data for analysis and are typically designed to follow a defined business process, making them essential for business intelligence and reporting. A well-designed fact table allows businesses to perform complex queries and generate insightful reports with minimal effort. Thus, the process of mapping out business processes to form these tables is crucial.

Types of Fact Tables

Before looking at the process of creating fact tables, it’s important to understand the different types of fact tables and their use cases:

Transactional Fact Tables: These are the most common type, capturing each transaction or event at the atomic level. They provide the most granular view of a business process, such as individual sales transactions.
Periodic Snapshot Fact Tables: These tables take a snapshot of a business process at defined intervals (e.g., daily, monthly). They’re useful for analysing the state of a process over time, such as monthly inventory levels.
Accumulating Snapshot Fact Tables: Used for well-defined processes with a standard number of steps, these tables track the lifecycle of a process. The grain is at the item level (e.g., an order), and columns identify key milestone dates. In the example of an accumulating snapshot fact table based around order steps, you would expect the number of records in this fact table to be equal to the number of orders in the business, which should be equal to the number of orders in the associated dimension table.
Factless Fact Tables: These tables don’t contain numerical facts but record the occurrence of events, often used to track many-to-many relationships between dimensions. These tables can also be used to track the duration of relationships, with effective to and from dates.

The Process of Building a Fact Table

Let’s walk through the steps of building a fact table, as illustrated in the following diagram:

Engaging with Business Stakeholders

The first step in mapping out business processes is to engage with the relevant business stakeholders. Don’t be fooled into thinking that you can see a transaction table in a source system and you can simply leverage that as a fact table. Business stakeholders have a deep understanding of the business operations and the key performance indicators (KPIs) that drive the organisation. They will be aware of any business rules that might be applied, or any additional levels of detail that might not immediately be apparent.

Identify Key Stakeholders: Determine who the key stakeholders are. This could include department heads, business analysts, and other influential personnel.
Conduct Interviews: Organise interviews or workshops to gather detailed information about the business processes. The goal is to understand the workflows, data requirements, and available sources of information.
Define Objectives: Clearly define the objectives of the project. What are the key questions the business wants to answer, and how will this work be leveraged to drive revenue, reduce cost, or improve culture? What KPIs are critical for success? While a fact table shouldn’t be built to cater to a specific report, it is critical to understand the KPIs that can be tracked along a business process, in order to bring visibility to opportunities for business improvement.

Documenting Business Processes

Once the initial engagement is complete, the next step is to document the business processes in detail. This involves creating process maps that visually represent the flow of activities and data within the organisation.

Process Mapping Tools: Leverage tools such as Mermaid for diagramming. Both Anthropic and OpenAI LLM offerings have the ability to natively build Mermaid charts, so the process maps can be written out (or talked through) in natural language, and then generated automatically.
Define Process Flows: Map out the end-to-end process flows, including all relevant steps, decision points, and data inputs/outputs.
Identify Data Sources: Document the data sources associated with each process. This includes databases, external systems, spreadsheets, and any other repositories of information.

Identifying the Appropriate Fact Table Type

Based on the documented business processes, determine which type of fact table is most appropriate for each process. Consider the nature of the data, the frequency of updates, and the analytical needs of the business. It is perfectly acceptable to have more than one fact table for any given business process, especially if the related metrics sit at different levels of grain.

Translating Business Processes into Data Models

With a detailed understanding of the business processes, the next step is to translate these processes into data models that will form the basis for the fact tables.

Identify Entities and Attributes: Determine the key entities involved in each process and their respective attributes. For example, in a sales process, entities could include Customers, Orders, and Products. Each of these entities will need a Dimension Table created in advance of being integrated into the Fact Table.
Define Relationships: Establish the relationships between different entities. This is crucial for understanding how data flows across the company and how, if necessary, it can be aggregated.
Create Logical Data Models: Use entity-relationship diagrams (ERDs) to create logical data models. These models should depict the entities, attributes, and relationships identified in the previous steps. Again, you can leverage Mermaid or similar tools for this process.

Design and build the Fact Table

Designing the fact table is a critical step that involves converting the logical data models into a physical schema that can be implemented in the data warehouse.

Identify Facts and Measures: Determine the facts (quantitative data) that need to be stored in the fact table. Measures are typically numerical values such as sales revenue, quantity sold, or transaction counts. Don’t be tempted to mix the grain of these metrics. You shouldn’t have both a transaction amount and year to date amount on the same record.
Define Dimensions: Identify the dimensions that will provide context to the facts. Dimensions are descriptive attributes related to facts, such as Time, Customer, Product, and Location.
Develop ELT Pipelines: Develop ELT pipelines to extract data from source systems, load it into the warehouse, and then transform it into the desired format as your fact table.
Data Validation: Implement data validation checks to ensure data accuracy and integrity. This includes checking for data completeness, consistency, and correctness.

Post Deployment Processes

While the creation of a fact table marks a significant milestone, it’s important to understand that the real work of deriving business value has only just begun. The role of the data team extends far beyond the initial implementation, encompassing ongoing measurement, analysis, and improvement recommendations.

Establish a Baseline

Upon releasing the fact table, establish baseline metrics for the business processes it represents.
This baseline will serve as a reference point for all future improvements and allow for quantifiable measurement of the impact of data-driven decisions.

Continuous Monitoring and Analysis

Regularly analyse the data in the fact table to identify trends, anomalies, and potential areas for improvement.
Use statistical techniques and data visualisation tools to gain deeper insights into the business processes.

Identify Improvement Opportunities

Based on the analysis, identify specific areas where business processes can be optimised.
Look for bottlenecks, inefficiencies, or untapped potential that the data reveals.

Collaborate with Business Stakeholders

Present findings and improvement recommendations to business stakeholders.
Work closely with them to understand the practical implications of the data insights and how they can be translated into actionable strategies.

Implement and Measure Changes

Assist in implementing the agreed-upon improvements.
Measure the impact of these changes using the fact table data, comparing against the established baseline.

Iterative Improvement Cycle

Treat this as an ongoing, iterative process. As one set of improvements is implemented and measured, begin looking for the next opportunity.
Continuously refine and expand the fact table as needed to support evolving business needs and newly identified areas for analysis.

Quantify and Communicate Value

Regularly quantify the business value generated through these data-driven improvements.
Communicate these successes to stakeholders, demonstrating the ongoing value that the data team brings to the organisation.

By embracing this continuous improvement mindset, the data team transforms from being simply a technical resource to a strategic partner in driving business value. The fact table becomes not just a static repository of information, but a dynamic tool for ongoing business optimisation and innovation.

This approach ensures that the investment in creating the fact table yields returns far beyond its initial implementation, continually adding value to the company through data-driven decision making and process improvement.

Conclusion

Mapping out business processes to form the basis for fact tables in a data warehouse is a critical task that requires close collaboration between data engineering teams and business stakeholders. By understanding the different types of fact tables, following a structured approach, utilising the right tools, and adopting an iterative methodology, data engineers can ensure that the data warehouse meets the business’s analytical needs and drives informed decision-making.

Remember, the key to success lies in thorough documentation, continuous feedback, and a deep understanding of both the technical and business aspects of the project. With these elements in place, your data warehouse will serve as a robust foundation for your company’s data-driven future.

Part 3 is available below:

Kimball Star Schemas in Data Warehousing: Part 3

Mastering Bridge Tables for Many-to-many Relationships