Kimball Star Schemas in Data Warehousing: Part 1

Dimension Tables and Business Domains

Andy Sawyer
11 min readJul 14, 2024

Introduction

The Data Warehouse Toolkit is the go-to text on building the presentation layer of your data warehouse. While the last update was all the way back in 2013, it still remains a cornerstone design pattern. It is renowned for its ability to streamline reporting and analytics processes. This article is the first in a three-part series exploring the fundamental components of the Kimball Star Schema. The focus in this instalment will be on Dimension Tables, the critical structures that provide context and enable multifaceted analysis of business data.

The purpose of this article is to delve deep into the concept of Dimension Tables, their significance in the Star Schema model, and their relationship to business domains. I’ll explore how these elements work together to create a unified and intuitive view of a business’s data landscape.

Dimension Tables surrounding a Fact Table

Understanding Dimensions and Business Domains

What is a Dimension Table?

A Dimension Table in a Star Schema is a fundamental structure that allows you to categorise facts and measures, allowing users to analyse data from various perspectives. However, they can also be used in isolation to provide a ‘single view of’ the domain they cover. These tables contain descriptive attributes that provide context to the numerical data stored in fact tables. For instance, in an airline company’s data warehouse, dimension tables might include ‘Customer’, ‘Flight’, ‘Time’, and ‘Location’, each offering different angles from which to examine operational data.

Dimension Tables typically consist of a primary key and multiple descriptive columns. For example, a ‘Customer’ dimension might include attributes such as CustomerID (primary key), Name, Address, FrequentFlyerStatus, and JoinDate. These attributes allow for slicing and dicing of data, enabling rich, multidimensional analysis. They can also contain metrics, where appropriate, to provide a more holistic view of the ‘thing’ you’re looking at in a single table.

What is a Business Domain?

A Business Domain represents a distinct area of business activity, encompassing its unique set of rules, processes, and data. It serves as a logical grouping of related business entities and operations. In the context of an airline company, business domains might include Customer Management, Flight Operations, Revenue Management, and Crew Scheduling.

Each business domain encapsulates the relevant data and business logic required to support its specific operations. This domain-centric approach helps in organising and managing the complexity of large-scale business operations and their corresponding data structures.

The Relationship Between Star Schema Dimensions and Business Domains

The relationship between Star Schema Dimensions and Business Domains is critical in creating an effective and intuitive data warehouse design. Dimensions are typically designed to closely align with and represent business domains within the data warehouse structure. This alignment serves several crucial purposes:

  1. Business Reality Reflection: By mapping dimensions to business domains, the data warehouse structure mirrors the real-world design of the business. This makes the data model more intuitive for business users.
  2. Simplified Analysis: When dimensions correspond to familiar business concepts, users can more easily navigate and analyse data, even without deep technical knowledge of the underlying data structure.
  3. Conformed Dimensions: Properly aligned dimensions facilitate the creation of conformed dimensions — standardised dimensional structures that can be used across multiple points of analysis. This promotes consistency in reporting and analysis across different business processes.
  4. Scalability and Flexibility: As business domains evolve, corresponding dimensions can be updated or expanded, allowing the data warehouse to grow in tandem with the business.
  5. Enhanced Data Governance: The clear relationship between dimensions and business domains supports better data governance by making it easier to assign ownership and manage data quality within specific business areas.

By thoughtfully designing dimension tables that reflect business domains, companies can create a data warehouse that not only stores data efficiently but also presents it in a manner that resonates with how the business actually operates. This alignment is crucial to unlocking the full potential of a Star Schema-based data warehouse, enabling more effective decision-making and richer business insights.

Business-Centric Approach to Building Dimension Tables

In an Agile environment, the process of building dimension tables should be iterative, collaborative, and aligned with business needs. This approach ensures that the resulting data warehouse structure is not only technically sound but also provides real value to the company. Here’s a detailed look at the process:

  1. Identify Key Business Domains:
  • Engage with business stakeholders to understand critical areas of operation.
  • Identify domains that are central to decision-making processes. In our airline example, the Customer domain is vital, so should likely be one of the first to be built.

2. Determine the Attributes:

  • For each domain, list out all potential attributes that describe the domain.
  • Consider both current needs and potential future requirements.
  • For the Customer domain, attributes might include: CustomerID, Name, Address, Email, PhoneNumber, FrequentFlyerStatus, JoinDate, LastFlightDate, TotalFlightsCount, DollarsSpentLastYear, PreferredAirport, etc.

3. Standardise Definitions:

  • Work with cross-functional teams to create standard definitions for each attribute.
  • Resolve any discrepancies in how different departments define or use certain terms.
  • For example, ensure “FrequentFlyerStatus” has a consistent definition across marketing, customer service, and operations departments.

4. Create Conformed Dimensions:

  • Develop dimensions that can be used consistently across multiple teams for different insight purposes.
  • Ensure the grain (level of detail) of the dimension is appropriate for all its potential uses. You should plan to start at the most detailed level possible. It is easy to roll things up later, but it is difficult to get more granular once something is in production.
  • For the Customer dimension, the grain might be one row per customer, with historical changes tracked using slowly changing dimension techniques.

5. Document and Communicate:

  • Create documentation for each dimension, including attribute definitions, data types, and business rules.
  • Use tools like Confluence, or other collaboration platforms to make this information easily accessible to the business.
  • Conduct training sessions or workshops to ensure all stakeholders understand how to use and interpret the dimensions.

6. Iterative Refinement:

  • Use Agile sprints to incrementally develop and refine dimensions.
  • Start with a minimum viable product (MVP) dimension and add complexity in subsequent sprints.
  • Regularly seek feedback from business users and augment the dimension structure as needed.

7. Data Governance Integration:

  • Establish data stewardship roles for each dimension.
  • Implement processes for ongoing data quality checks and updates to dimension attributes.

8. Performance Optimisation:

  • On an ongoing basis, review query patterns and optimise dimension structures for commonly used attributes.
  • Consider creating derived attributes or pre-aggregated values on the dimension if they significantly improve query performance. Kimball is not opposed to storing historical metrics on a dimension table if it assists with providing insights to the business.

9. Business Value Validation:

  • Regularly assess the usage and value of dimension attributes. While at first it might seem appropriate to add everything you possibly could to a dimension, 100 attributes of little value clutter the space and make it harder for the business to see the 15 attributes that really help drive understanding.
  • Be prepared to deprecate attributes that are no longer providing value or to add new ones as business needs evolve.

Sprint Planning and Execution

In an Agile framework, the development of a conformed dimension typically spans multiple sprints. While this is not always the case, and some may be very simple to implement, don’t underestimate the effort involved. Not only the technical development work, but the business and stakeholder engagement to align on standard names and definitions:

  • Sprint 1: Focus on identifying the lowest grain and core business definitions. Collaborate with key stakeholders to prioritise the most critical attributes, and get the documentation in place.
  • Sprint 2–3: Develop the initial dimension structure and populate it with data. Begin integration with fact tables.
  • Sprint 4–5: Refine the dimension based on feedback, add additional attributes, and implement slowly changing dimension logic if required.
  • Subsequent Sprints: Continuous improvement, performance optimisation, and addition of new attributes as business needs evolve.

Engagement with the Business

Consistent engagement with business stakeholders is crucial throughout this process:

  • Hold regular review sessions to ensure the dimension structure aligns with business needs.
  • Use visual tools like Mermaid diagrams to communicate the data model and relationships.
  • Create PoC dashboards or sample reports to demonstrate the value of the dimension in real-world scenarios.
  • Establish a feedback loop for ongoing refinement and prioritisation of dimension enhancements.

Considerations in Dimension Table Design

When building dimension tables, several considerations can significantly enhance the value and performance of your data warehouse. I’ll aim to cover some of these below:

Integration of Additional Data Sources

While initial dimension attributes often come from primary operational systems, integrating data from additional sources can greatly enrich your dimensions and provide deeper insights.

Third-Party Data Integration:

  • Identify valuable external data sources that can augment your dimensions.
  • Examples: demographic data for customer dimensions, market data for product dimensions.
  • Plan the integration process carefully, considering data quality, update frequency, and mapping to existing attributes.

Cross-Functional Collaboration:

  • Work with various departments to identify useful data sources they might have.
  • Ensure legal and compliance teams are involved when integrating external data.

Data Quality Assurance:

  • Implement robust data cleansing and validation processes for integrated data.
  • Set up monitoring to ensure ongoing data quality from all sources.

Handling Slowly Changing Dimensions (SCDs)

A Slowly Changing Dimension is a dimension that not just captures the current position of the attributes for the business domain, but tracks the historical changes as well. Taking a customer dimension in an airline business as an example, this would allow you to see not only what their current Frequent Flyer Status is, but also what their status was at the time of any historical flight booking.

While a Type 1 will be the easiest to implement, it does come with significant drawbacks. I would always recommend starting with a Type 2 dimension if the business is unsure of what it wants. The simplest reason for this is historical accuracy.

Say you work in a fruit and veg business, and you have a product — a tomato — that has a product grouping of ‘Vegetable’ in January. In February, a report goes to the executives showing monthly sales for the year by product group. All sales of tomatoes will sit in the grouping of Vegetable. In March, someone decides that a tomato should be more technically defined as a fruit, and so the product group for tomato is updated. In a Type 1 dimension, this will result in all historical transactions for tomatoes now showing under the product group of ‘Fruit’. The March report goes to the executive team, and one of the executives expresses concern that while the totals for January are the same as the prior report, the details have changed. They’ve lost trust in the data.

Source: Kimball Group

Implementing Hierarchy Structures

Hierarchies are fairly common within dimensions. A date dimension is the easiest to consider. Days roll up into Months, which roll up into Quarters and then Years. This is an example of a well defined hierarchy. Each level of the hierarchy can be separated into its own attribute in the dimension table.

More difficult to model are the ragged hierarchies. An example might be an org chart. The CEO might have three reports. One is their Executive Assistant who has no reports of their own. The other two might be function leads with hierarchies that run 5–6 levels deep. What’s worse, these levels may change over time without too much notice. For these, we can use recursive hierarchies, showing the primary key of the parent on each record.

  • Flattened Hierarchies: Include multiple levels of a hierarchy as separate attributes for easy drilling.
  • Recursive Hierarchies: Implement for organisational structures or product categories with variable depths.

Conformed Dimensions Across the Enterprise

While it can be challenging, and may involve a lot of business discussion, ensuring dimensions are conformed across different subject areas will make life far easier in the long-term:

  • Standardise attribute names and definitions across the organisation.
  • Implement a robust master data management (MDM) strategy.
  • Use surrogate keys to manage dimension members across different source systems.

Managing Detail Levels in Event-Driven Sources

Event-driven data sources can provide highly granular information, but managing this level of detail requires careful consideration, specifically when you have been asked to track history in a Type 2+ style dimension:

Appropriate Reporting Interval:

  • Analyse business requirements to determine the right level of granularity.
  • For most business scenarios, daily updates are sufficient and manageable.

Handling High-Frequency Updates:

  • For sources with frequent updates (e.g., 50+ updates within 10 minutes), consider implementing a staging area.
  • Use the staging area to aggregate updates before applying them to the dimension table.

Adding Metrics to Dimensions

While dimensions primarily contain descriptive attributes, including certain metrics can enhance analytical capabilities:

Static Metrics:

  • Include relatively stable metrics that provide context.
  • Example: Standard sales price in a product dimension.

Historical Aggregate Metrics:

  • Add pre-calculated aggregates to speed up common queries.
  • Limit these to metrics that span only two dimensions: the dimension itself and time.
  • Example: Total lifetime value in a customer dimension.

Considerations for Adding Metrics:

  • Ensure added metrics don’t change frequently to avoid excessive updates.
  • Document the calculation methods and update frequency for these metrics.
  • Be cautious about adding too many metrics to maintain dimension table simplicity.
  • Ensure any metrics used align to the same business rules as the fact table they can also be calculated in. Don’t create two versions of the truth!

By carefully considering these aspects of dimension table design, you can create a more robust, performant, and insightful data warehouse that better serves your company’s analytical needs.

Conclusion: Mastering Dimension Tables in Star Schema Design

In this first part of our two-part series on Kimball Star Schemas in Data Warehousing, I’ve looked at Dimension Tables, exploring their role in creating a business-aligned data warehouse.

I’ve covered several key aspects of dimension table design and implementation:

  1. Understanding Dimensions and Business Domains: I looked at the fundamental concepts of dimension tables and their relationship to business domains, emphasising how this alignment creates an intuitive and effective data model.
  2. Business-Centric Approach: I outlined a step-by-step process for building dimension tables in an Agile environment, highlighting the importance of stakeholder engagement and iterative development along the journey.
  3. Advanced Considerations: I examined several advanced topics in dimension table design, including integrating additional data sources, managing event-driven data, adding metrics to dimensions, and handling slowly changing dimensions.

Throughout the article, I’ve hopefully emphasised the importance of aligning dimension table design with business needs and processes. This alignment is crucial for creating a data warehouse that not only stores data efficiently but also presents it in a manner that resonates with how the business operates.

The process of designing and implementing dimension tables is both an art and a science. It requires a deep understanding of the business, technical expertise, and the ability to balance various factors such as performance, usability, and maintainability. By following the principles and practices outlined in this article, you can create dimension tables that serve as a strong foundation for your data warehouse.

Looking Ahead: Fact Tables, Bridge Tables, and Bringing It All Together

While dimension tables provide the context for your data, they are only one part of the Star Schema equation. In the second part of this series, I’ll turn my attention to Fact Tables — the complementary component that works alongside dimension tables to create a powerful analytical framework.

In the upcoming article, I’ll explore:

  • The nature and purpose of fact tables in a Star Schema
  • Different types of fact tables and when to use each
  • Best practices for designing and implementing fact tables
  • The interplay between fact and dimension tables

Lastly, I’ll cover Bridge Tables. By the end of the mini-series, you’ll have a solid understanding of the different tables types, allowing you to design and implement effective Star Schema data warehouses that drive analytics and informed decision-making in your business.

Part 2 is available below, where we’ll continue our journey through the Kimball Star Schema and look at adding fact tables into your data warehousing projects.

--

--

Andy Sawyer

Bringing software engineering best practices and a product driven mindset to the world of data. Find me at https://www.linkedin.com/in/andrewdsawyer/