Kimball Star Schemas in Data Warehousing: Part 3

Mastering Bridge Tables for Many-to-many Relationships

Andy Sawyer
7 min readJul 29, 2024

Introduction

Welcome to the final instalment of my mini-series on the Kimball Star Schema! In the previous articles, I looked at the foundational elements of this presentation layer modelling approach. I discussed dimension tables, which provide rich context to the business processes, and fact tables, which capture the measurable events within those processes. Now, it’s time to tackle a more advanced concept that often puzzles even data professionals: bridge tables.

Bridge tables might not be as ubiquitous as dimension or fact tables, but they play a crucial role in handling complex relationships within your data models. In this article, I’ll attempt to demystify bridge tables, explore when and why to use them, and provide practical examples to solidify your understanding. So, whether you’re a data modeller looking to refine your skills or a data engineer seeking to optimise your warehouse design, buckle up — we’re about to bridge the gap (pun intended) in your Kimball Star Schema knowledge!

What Are Bridge Tables?

Imagine you’re building a bridge in the real world. Its purpose? To connect two points that would otherwise be difficult or impossible to join directly. That’s essentially what bridge tables do in the world of data modelling — they create connections where direct links are impractical or inefficient.

In more technical terms, bridge tables are intermediary tables that facilitate many-to-many relationships between dimension tables and fact tables. They come into play when a single metric can be associated with multiple records in a dimension table, a scenario that can quickly become unwieldy without proper handling.

The Need for Bridge Tables

You might be wondering, “Why can’t we just use dimension or fact tables to handle these relationships?” Great question! Let’s break it down:

  1. Maintaining Star Schema Simplicity: The star schema is enjoyed by analysts due to its simplicity and query efficiency. Direct many-to-many relationships can complicate this structure, making queries more complex and slower.
  2. Data Integrity: Without bridge tables, you might be tempted to duplicate data across multiple rows, leading to potential inconsistencies and update anomalies.
  3. Flexibility: Bridge tables allow for more dynamic relationships, easily accommodating changes over time without requiring major restructuring of your core fact and dimension tables.
  4. Performance: While it might seem counter-intuitive, properly implemented bridge tables can actually improve query performance by allowing for more efficient joins and filtering.
  5. Avoiding Fact Table Bloat: Without bridge tables, you might be forced to create extremely granular fact tables that are indeed more granular than the natural level at which they occur, leading to massive table sizes and reduced query performance.

Real-World Scenario: Banking Transactions

Let’s make this concrete with a real-world example from the banking industry that hopefully most people will be familiar with. Imagine you’re designing a data warehouse for a bank that needs to track transactions across customer accounts. Here’s the twist: some accounts have multiple owners. How do we model this efficiently?

The Scenario:

  • Customers: Each customer can have multiple bank accounts.
  • Accounts: Each account can be owned by multiple customers.
  • Transactions: Each transaction is linked to a specific account.

Without a bridge table, you might be tempted to create a model that looks like this:

Modelling without a bridge table

But this approach has several problems:

  1. It doesn’t accurately represent joint accounts. You’d need to duplicate the transaction for each account owner, leading to data inconsistencies and inflated transaction counts. This means additional calculations within your fact to divide any underlying transaction amount by the number of customers before inserting it into the fact.
  2. If account ownership changes, you’d need to update historical transactions, violating the immutability principle of fact tables.
  3. Queries to find all transactions for a customer become more complex and less efficient.

Enter the bridge table solution:

Implementing the Bridge Table

Introducing a bridge table for additional flexibility

This structure solves our problems:

  • It accurately represents joint accounts without duplicating transaction data.
  • Historical accuracy is maintained even if account ownership changes.
  • Queries can easily join through the bridge table to connect customers to their transactions. You can multiply the transaction_amount by the pct_ownership to ensure any totals are maintained, or look at the original transaction_amount directly against each customer.

When to Use Bridge Tables: A Decision-Making Framework

While bridge tables are powerful, they’re not always necessary. Here’s a decision-making process to help you determine when to use them:

Is a bridge table needed?

Let’s break down this decision process:

  1. Identify the Relationship: First, determine if you’re dealing with a many-to-many relationship. If not, a standard star schema approach will suffice.
  2. Consider Fact Table Adjustments: Can you solve the problem by adjusting the grain of your fact table? For example, in our banking scenario, if we only cared about primary account holders, we could include CustomerID directly in the fact table, and document this business assumption.
  3. Evaluate Dimension Table Changes: If fact table adjustments don’t work, can you modify your dimension tables to eliminate the many-to-many relationship? In some cases, you might be able to create a combined dimension that represents the relationship. For example, if there are a maximum of two account holders, you could pivot the names of the account holders into two pre-defined fields in the account dimension.
  4. Implement a Bridge Table: If neither fact nor dimension table adjustments are feasible or desirable, it’s time to implement a bridge table.

Best Practices for Bridge Tables

Now that you know when to use bridge tables, let’s discuss some best practices to ensure they’re implemented effectively:

  1. Keep It Simple: Use bridge tables only when necessary. If you can solve the problem by adjusting your fact or dimension tables without compromising data integrity or query performance, that’s often the preferable approach.
  2. Consider Time Dimensions: Often, relationships in bridge tables change over time. Including start and end dates (or effective and expiration dates) can help maintain historical accuracy.
  3. Maintain Data Integrity: Ensure that your bridge table is updated consistently with changes in the associated dimension tables. In our banking example, if a customer is removed from an account, make sure to update the EndDate in the Account_Customer_Bridge table.
  4. Optimise for Performance: Bridge tables can introduce additional joins in your queries. Ensure that you have appropriate indexes in place and consider materialising common join paths if query performance becomes an issue.
  5. Add Relative Weighting: To ensure you don’t explode out the metrics, provide relative weightings on your bridge table that can be used to multiply out the fact table metrics. In the banking example, it may be that there are two customers on an account. The first has 60% ownership and the second has 40%. So this should be shown on the bridge table. The metrics on the facts can be multiplied by the relative weightings to ensure the total values don’t change. I’ve used a pct_ownership column in the ERD above.
  6. Document Thoroughly: Bridge tables can add complexity to your data model. Ensure that you document their purpose, structure, and usage clearly for other team members and future maintainers.
  7. Handle Null Relationships: In some cases, you might have entities that don’t participate in the many-to-many relationship. Decide how you’ll handle these cases and document your approach.
  8. Consider Attribute Bridges: Sometimes, you might need to bridge to a group of attributes rather than a full dimension. In these cases, consider creating an attribute bridge, which links to a subset of dimension attributes rather than a full dimension table.

Common Pitfalls and How to Avoid Them

Even with best practices in mind, there are some common mistakes that people make when implementing bridge tables. Here are a few to watch out for:

  1. Overuse: Don’t use bridge tables for every complex relationship. Sometimes, denormalization or other techniques might be more appropriate.
  2. Ignoring Performance: Bridge tables can impact query performance if not implemented correctly. Always test your queries and optimise as necessary.
  3. Neglecting Updates: Bridge tables need to be maintained just like any other table in your data warehouse. Ensure you have processes in place to keep them up-to-date.
  4. Misunderstanding Cardinality: Make sure you truly understand the relationship you’re modelling. A misunderstood relationship can lead to an inefficient or incorrect model.
  5. Forgetting About History: If your bridge represents a relationship that changes over time, make sure you’re capturing that history appropriately.

Conclusion

Bridge tables are a powerful tool in the data modeller’s toolkit. They allow us to represent complex, many-to-many relationships while maintaining the simplicity and efficiency of the star schema. By understanding when and how to use bridge tables, you can create more flexible, accurate, and performant data models.

Remember, the key to successful data modelling isn’t just knowing the techniques, but understanding when to apply them. Bridge tables aren’t always the answer, but when used appropriately, they can significantly enhance your data warehouse’s capabilities.

As we conclude this mini-series on the Kimball Star Schema, I hope you feel more confident in your ability to design effective data models. From dimension tables to fact tables, and now bridge tables, you have the foundational knowledge to tackle complex data warehousing challenges.

--

--

Andy Sawyer

Bringing software engineering best practices and a product driven mindset to the world of data. Find me at https://www.linkedin.com/in/andrewdsawyer/