Data Modeling in Power BI: What Is It, and Why Is It Useful?

Published in

The Startup

4 min readJan 28, 2021

--

For almost every transaction, there are usually four questions that can be asked about its occurrence: who, what, when, and where? For most roles in business analytics, being able to easily answer these key questions leads to better decision-making, planning, and root-cause analysis. In my role as a Demand Planner, answering these questions is fundamental to understanding a forecast. Questions like: “who is ordering this product?”, “when is inventory expected to reach the Distribution Center?”, and “which accounts are in the forecast?” are all important questions to answer quickly and with conviction. In the following scenario, I will show how importing and modeling Excel transactional data in Power BI using a STAR data schema can assist in your data analysis.

Fact Tables: The Numerical Data

These tables contain transactional data in a business setting, and contain both the numeric data and foreign keys that are contained as unique keys in the dimension tables. For instance, a fact table may contain sales amounts, with a column specifying product key, transaction date, customer ID, and customer zip code. In a fact table, there will likely be duplicate values — in fact, this would indicate a healthy business, as one would expect repeat customers, product orders, etc. In reality, these fact tables are usually viewed when aggregated by their dimensions (i.e. total sales by time, sales by customer, sales by region, etc.) and less often examined at the line-level of detail. An example fact table is shown below (example data is from a Sample Superstore dataset provided by Tableau. Link to source data is here ).

As we can see, this data isn’t particularly useful in silo. Perhaps we would want to merge details about our customer, region, product etc. to get a better understanding of our data. Merging, however, can add storage space and time. Linking this fact table to dimension tables makes for an easier, more repeatable solution that allows data to be summarized in a variety of ways

Dimension Tables: The “Who? What? When? and Where?” Tables to Every Transaction

Dimension tables are descriptive tables that give more detail about your data. For instance, let’s consider our sales dataset, that contains four columns: a product key, a sales date, a customer ID, and a zip code. Dimension tables would help answer more questions about the information contained in the fact table (i.e. product category/sub-category of the product ID, customer name/segment of the customer ID, state of the zip code, etc.). Unlike fact tables, these dimension tables contain an unduplicated primary key that is joined to the fact table. Below I have posted an example dimension table (Product Table) which contains descriptive information about each product ID.

Similar to the transaction table, this product table on its own isn’t particularly useful. By joining the transactional data with the product table by product ID, it will be easier to segment sales by different categories/sub-categories.

Can’t I just merge data together or perform a VLOOKUP?

While it is easy to want to pull in attributes to a transactional table via a merge, it is less repeatable and more expensive in terms of space and time. Merging data not only adds columns, but also requires formulas/joins each time more dimensions are pulled in. By creating dimension tables and linking them to fact tables, it is easier to create dynamic charts from transactional data without having to merge in new information each time. Less time spent digging through different data sources means more time finding valuable insights.

A Simple Model: The STAR Schema in Power BI

Below I have drawn in the relationships between the dimension tables and the transactional data. Note the one-to-many cardinality between the dimension and fact tables.

After importing the transactional data and dimension tables (product, customer, and region), I created an unduplicated date table within Power BI. Power BI has excellent functionality in creating date tables; I used the DAX query CALENDAR() to set the parameters for this table. Once these were imported, I went into the “Modeling” tab of Power BI, and created relationships between the dimension tables and fact tables. This created a “star” schema, with the central data being the fact table, surrounded by the dimension tables. Any measures in the fact table (Transaction table) can then be broken out by these dimensions.

The End Product: Repeatable and Effective All-in-One Dashboards

By creating a Star Schema in Power BI, it is easy to segment out sales, profitability, and other measures by important dimensions. This allows for easy “slice-and-dicing” of your fact table by different dimensions

As seen in the dashboard above, this STAR data model helped create an end product with far easier-to-understand presentation of the data than the original fact table. When dimension tables are linked to their relevant fact tables, it becomes far easier to answer the “who”, ”what”, “when”, and “where” questions that frequently occur in a business analytics setting. For instance, by joining these dimension tables to their fact tables, this Power BI model segments sales by customer, product category, date, and region — all in one dashboard!

PS: If you enjoyed this article, or have any feedback, I would love to hear it. Feel free to add me on Linkedin or shoot me a message:

https://www.linkedin.com/in/peter-carley-24230b9b/