Unleashing Business Insights with Power BI: The Keys to Successful Data Modeling

Aaron Parry
Learning Data
13 min readFeb 15, 2024

--

(Part III)

Time to dig into Part III in our six-part series designed to help beginners master the essentials of Power BI. Fasten your seatbelts because data modeling is easily the most crucial aspect of Power BI usability and performance.

As a quick recap, remember that in Part I we outlined why organizations need a business intelligence solution — it allows them to create and share data-driven actionable insights. And in Part II we covered the basics of getting started with Power BI and some of the key Get Data and ETL processes with Power Query.

Part III dives into the crucial aspects of data modeling and covers how creating a proper data model can significantly influence the performance, usability, and flexibility of your Power BI reports and dashboards. Understanding the following concepts and components will not only help enhance your ability to create accurate calculations, impactful visuals, and insightful analysis but ultimately help you create optimized, flexible, and scalable data models.

Here’s what we’ll cover:

  • Normalization
  • Purpose of tables
  • Primary & foreign keys and the importance of relationships
  • Filter flow
  • Common schema types
  • Calendar tables
  • Semantic models

Without further ado, let’s dig in!

Normalization — The Backbone of Efficient Data Modeling

Normalization is a database design technique that reduces data redundancy and enhances the integrity of data. The primary goal of normalization is to structure your data in a way where each piece of information is stored only once, eliminating inconsistencies and redundancy.

In Power BI, a normalized data model can improve report performance and make your semantic model (this is the new term for datasets in Power BI, more on this later!) easier to maintain and update.

Models that aren’t normalized contain redundant, duplicate data. In the image above, all of the product-specific fields could be stored in a separate table where each product_id is stored as a unique record. This may not seem like a critical step now, but minor inefficiencies can become major problems at scale!

Here’s what this table could look like in a normalized state, separated into multiple, purpose built tables (lookup and fact).

The process of normalization involves reorganizing data from a single table into multiple, purpose built tables. Far too often people that are just getting started with Power BI try to combine all their data into one large master because they think it makes reporting easier. But, that couldn’t be farther from the truth. Normalizing your model and then creating relationships to connect tables using their primary and foreign keys (more on this soon), is the highly optimized, and recommended, approach.

I think I’m getting a little ahead of myself. Before we get into creating relationships, let’s talk a bit more about the purpose of tables within a data model.

Purpose of Tables — Organizing Data for Insight

In a Power BI data model, you can think of each table as a building block, where each block is a separate, but critical, component of the whole building.

Each table should have a clear purpose, whether it’s storing transactions, customer details, product information, time dimensions, slowly changing dimensions, geographic details, etc. When you have properly structured, purpose built tables it ensures that your data is accurately represented, easy to filter, and easily accessible for analysis with DAX calculations.

In Power BI there are a couple of key table types to remember:

  • Lookup or Dimension tables: These tables contain descriptive attributes, or dimensions (e.g. product names, categories, etc.), and help to categorize and filter fact data
  • Fact Tables: These tables store quantitative data or metrics (e.g., sales amounts, quantities sold, etc.) and are often the focus of the analysis

Here’s what a properly constructed model would look like with Lookup and Fact tables

So now that we’ve got an understanding of the types and purpose of tables, how do we connect the tables with those lines and arrows like we see in the picture? I’m glad you asked because that’s where primary keys, foreign keys, and relationships come into play.

Primary Keys, Foreign Keys, and Relationships — Linking Data Together

Primary and foreign keys are really the glue that hold the tables in your model together. A primary key is a unique identifier for each record (row) in a table, ensuring that each record is distinct. A foreign key, on the other hand, is a field (or collection of fields) in one table that uniquely identifies a row of another table. Saying that a bit differently, the foreign key effectively points to the primary key of another table, creating a link between them.

Here’s an example data model that contains Lookup (dimension) tables with primary keys (1 side) that are connected to Fact (data) tables (* side) that have foreign keys. The tables are connected via their primary- foreign key relationships.

Relationships between tables define how data in one table relates to data in another. The one-side of a relationship is the primary key and the many-side of a relationship is the foreign key. This is also referred to as relationship cardinality — cardinality is the term used to describe the nature of the relationship and describes how many records in one table correspond to the records in another.

Saying this a bit differently, cardinality refers to the uniqueness of values in a column. Here are the common relationship types in Power BI and their cardinality:

  • One-to-one (1 : 1) — Here, each record in one table relates to a single record in another table. The cardinality of this relationship is one-to-one. This relationship type isn’t very common and ultimately means that the tables could be combined into a single table.
  • One-to-many (1 : *) — Here a single record in one table (lookup) relates to multiple records in another table (fact). The cardinality of this relationship is one-to-many.

Note: One-to-many relationships are the most common relationship type in Power BI models. Ideally, all relationships in the data model should follow a one-to-many cardinality; one instance of each primary key, and many instances of each foreign key

  • Many-to-many (* : *) — This relationship has records in one table that relate to multiple records in another table, and vice versa. Many to many relationships are not common in Power BI and are typically used to solve specific data modeling challenges. The cardinality of this relationship in the image below is many-to-many.

Relationships between tables help Power BI developers create meaningful DAX calculations that are capable of being sliced and diced by different dimensions. Along with defining cardinality, relationships also determine how data is filtered between tables — let’s dig into filter flow.

Filter Flow — Directing your Analysis

The direction of filter flow in a Power BI data model is important because it ultimately determines how filters applied to one table affect the data displayed from another table. This filtering process is referred to as filter context in Power BI. Properly understanding and setting up the direction of filter flow is crucial for accurate DAX measure creation, data analysis, and reporting.

In general, filters should flow from the one-side of a relationship to the many-side because this aligns with how data naturally aggregates. But filters can actually be directional (one-side to many-side or many-side to one-side) or bidirectional.

In the above model, we have a single direction filter that flows “downstream” from the lookup table to each fact table. Because of this, the only way to filter both Sales and Returns data is to use the shared Territory lookup table. When we apply a filter from the Territory Lookup table, that filter context is passed down to the fact tables, from the one-side to the many-side, and correctly filters the Sales and Returns tables.

It’s important to note that filter context cannot flow “upstream”, or against the direction of the arrow. If you’re familiar with electrical or hydraulic circuits, think of this filter flow as a diode or one-way check valve where electricity, fluid, or filters in a data model can only travel in one direction.

But this only pertains to single direction filters and you can also create bidirectional filters by updating the cross filter direction from single to both. Doing this allows the filter context to flow in either direction.

In this example, filters applied from the Sales table can be passed “upstream” to the Territory Lookup table, then down to the Returns table. If you pause for a moment and think about the implications here, you may realize that it’s possible to see correct Order and Return quantities using the Territory key from either the Sales table or the Territory Lookup.

In the example above, I’m highlighting how you could enable bidirectional filtering and what’s possible, but there are some big concerns with this approach, especially when the model is more complex than our simple three table model here. One of the most common issues with bidirectional filtering is something called ambiguity and ambiguity is introduced into a model when there are multiple filter paths between tables. In essence, the model doesn’t know exactly how to filter the tables because multiple options exist.

In this example, filter context from the Product table can pass down to Returns and up to Territory Lookup, which would be filtered based on the Territory Key passed from the Returns table.

But, with an active relationship between Product and Sales as well, filter context could pass through either the Sale or Returns tables to reach the Territory Lookup table, which could yield conflicting filter context and introduce ambiguity.

The whole point here is that you should be aware of what’s possible when building a data model and also understand the implications. Your ultimate goal should be to design your models with one-way filters and one-to-many cardinality unless more complex relationships are absolutely necessary.

Now that we’ve got a handle on relationships and filter flow, the next component of data model design is the schema type.

Common Schema Types — Structuring your Model

Choosing the right schema for your data model is important because of the negative impact an improperly structured model can have on performance and data integrity. In Power BI the two most common types of schemas are the star schema and snowflake schema.

A star schema consists of a central fact table that’s surrounded by dimension tables. It’s simple, efficient, and honestly the ideal model structure for most Power BI models.

A snowflake schema is a slightly more complex structure where dimension tables are normalized into multiple related tables that create a “chain” of relationships between tables. Snowflake schemas ultimately reduce redundancy but may complicate the model.

Aim to build a star schema when possible but understand that adding in a snowflake structure, like the Product / Categories / Subcategories “chained” lookup tables in the AdventureWorks data model (pictured below), is a really solid way to normalize a larger lookup table that would have otherwise contained duplicate, redundant data points and possibly be less optimized.

The data in the AdventureWorks model is generally a star schema but has a snowflake component to normalize the product details, categories, and subcategories and is a pretty good example of how the two schema types can be combined in an efficient way.

Calendar Table — The Key to Time Intelligence

We’ve covered all the basics of normalization, table set up, relationships, and schema design. One of the last key components of a data model in Power BI is to have a dedicated calendar table.

Having a dedicated calendar table in Power BI is crucial for several reasons, especially when it comes to performing time intelligence operations and ensuring your reports and dashboards can handle time-based analysis accurately and flexibly.

  1. Uniform date handling — Calendar tables ensure that all date-related analysis in your Power BI reports use a consistent set of dates and that all dates within the range are represented. This is particularly important when your fact table contains missing dates because you need a contiguous set of dates for time intelligence functions in Power BI.
  2. Improved time intelligence functions — Power BI provides time intelligence functions that enable you to perform complex calculations over time, such as calculating year-to-date totals or comparing sales figures from the previous month. These functions work most effectively with a continuous and complete set of dates, which a dedicated calendar table provides. Without a calendar table, these calculations can become more complex and less reliable.
  3. Consistent time comparisons — A calendar table also allows you to create more dynamic and flexible visualizations. You can easily compare performance across different time periods, visualize trends over custom fiscal periods, highlight seasonal variations, etc. Using a consistent date dimension ensures that visualizations remain accurate and meaning, especially when new data is added or updated.
  4. Custom date dimensions — Calendar tables allow you to define and use custom date dimensions that are not automatically available in your data. This can be columns like fiscal years, quarters, working days, holidays, and other periods that do not follow a standard calendar.
  5. Performance optimization — By centralizing date-related calculations in one table, you reduce the complexity and computational load associated with performing these calculations across multiple tables. This is especially true for large datasets with many date fields or complex models where efficiency becomes critical.
  6. Simplifies relationships & filtering — Having a single calendar table simplifies the relationships in your data model. Instead of having multiple date fields across different tables (each potentially requiring its own relationship), you can have all date-related fields in your fact tables relate back to a single calendar table. This setup streamlines the filter context and makes it easier to manage how date filters apply across your entire model.

There are likely more reasons I could cite, but I think you get the point. Using a dedicated calendar table is key to effective time-based analysis in Power BI and provides the foundation for accurate, consistent, and insightful exploration and also improves the integrity and performance of your model.

Let’s start to wrap things up by covering semantic models!

Semantic Models — Datasets Updated

Semantic models in Power BI are really just datasets rebranded. However, semantic models provide some broader capabilities and access to new tools. The update was made because of two key factors:

  1. The name dataset is a very generic term and not descriptive enough to really encapsulate what’s happening in Power BI. This is especially true now that Power BI is one of the components in the Microsoft Fabric ecosystem.
  2. The term semantic model better reflects the functionality of Analysis Service, which is what Power BI reports are built on. Semantic model is a better term to capture the logical layer in Power BI that contains transformations, calculations, and table relationships

The semantic model view in Power BI provides a comprehensive overview of your data model including tables, relationships & cardinality, calculation groups, measures, treatments, roles, and more. This view helps developers better understand the over structure of the model and access new tools.

At the end of the day, the term semantic model is an updated way of describing a source of data that’s ready for reporting and visualization within Power BI. There are some other bells and whistles that come with this update, but it’s really all about more accurately describing a dataset in Power BI.

Final Thoughts

Okay, that was a lot and I hope you’re able to see how vitally important building a proper data model is to the overall efficiency, usability, and scalability of a Power BI report.

With a bad model setup your report won’t go far. But, if you follow the best practices outlined in this article and make sure your model is normalized, has purpose built tables, proper primary and foreign keys, one-to-many relationships, has a calendar table, and a proper design you’ll be well on your way to creating efficient, scalable Power BI reports.

Stay tuned for the next article in the series where we’ll dig into calculations with DAX!

Until then, learn on 🤘

Interested in learning more Power BI from Aaron and the other Maven experts?

Check out Part I & Part II of his Comprehensive Power BI Guide here on Medium, and check out our course catalog!

Our Power BI & Python Immersives are open for enrollment!

Key Deadlines:

  • Super Early Bird — Save 25%: Friday, March 29th
  • Early Bird — Save 10%: Friday, April 12th
  • Admissions Close: Friday, April 19th

Here’s what you can expect from each immersive…

✔️ Guided portfolio projects

✔️ Access to your own private learning community

✔️ Weekly, 1-hour live sessions with recordings

✔️ Live support from expert instructors

✔️ Unlimited Maven access for an entire year

…and all of this is built into a program that’s designed to fit into your busy life, at a more affordable price.

Space is limited; secure your seat today!

-Team Maven

--

--

Aaron Parry
Learning Data

Lead Power BI instructor @ Maven Analytics, girl dad, adventurer, and collector of data