Guide to Data Modeling with Power BI and its Importance in analyzing the data

Power BI — Guide to Data Modeling

Data Modeling is one of the key strengths of Power BI to model and shape data

SUMIT SHARMA
Geek Culture

--

Power BI — Guide to Data Modeling

When we have huge data coming from multiple sources, knowledge of Data Modeling in Power BI becomes very important to model and shape that data to make it easier for self-service BI. And it becomes important to learn Data Modeling when we need to create reports from different fields of different tables. We can say, Data Modeling is one of the key strengths of Power BI because it does not necessarily demands all the data to be in a single table, rather we can have data in different tables and we can define a relationship between those tables with a common column. Let’s discuss more about Data Modeling in Power BI and its importance.

Photo by Marvin Meyer on Unsplash

Power BI is a data model based BI reporting tool, not all the tools are data model based. Some examples of non-data model based BI tool is SQL Server Reporting Service (SSRS) and traditional excel.

What is Data Modeling in Power BI ?

Data Modeling is one of the feature in Power BI, which is used to connect multiple data sources using a relationship. A relationship actually defines how the data sources are connected with each other or it defines a logical connection between different data sources.

The stages involded in a Power BI Project

A sample example is shown below where data is coming from multiple data sources in multiple tables —

Example of data from different sources into multiple tables

Data Modeling in easy language —
When we get data from multiple tables, we connect these tables together and create a relationship between these tables is called Data Modeling.

Usage — Data modeling feature allow businesses to define the new metric and to perform custom calculation on those metrics.

Data Modeling — Important pillar among 7 Pillars for Mastering Power BI

Data Modeling is an important skill for Power BI Developers and Analysts to master.

Power BI Data Modeling tips are often overlooked.

In my another article (link provided below) where we discussed about 7 Pillars for Mastering Power BI, we defined Data Modeling as one of the important point for mastering Power BI. Also, we cannot move to Calculation phase if we do not know Data Modeling. Article link is provided below —

Power BI Data Modeling —

So where does data come from?
Usually from database, where we have lots of tables, and we should never import all the tables directly, rather we should reduce the tables by joining them using common columns, where the set of tables are reduced to less number of tables shown below -

Example where tables are reduced by joining them

For joining the tables, we have some rules which we should follow —
Never ever join all the tables into one big table. While joining the tables, we should always think about dimensional model.

Dimension Tables —

A dimesion is a table that describes a business entity, the things in our model or the things in our company. They are the look up tables.

Entities can include products, people, places and concepts including time itself.

For example — The Who, The When, The Where, The What.

Example of Dimension and Fact tables in Data Modeling

Usually we use them in the slicers and on the axis the visuals. And the main goal of these tables is to group and filter the values that are stored on the fact tables.

Fact Tables —

The fact tables are the events, whenever we have something that happens along the time. Fact table store observations or events, and can be sales orders, stock balances, exchange rates, temperatures etc.

For exmaple — a sale, a purchase or a transaction, any kind of historical transaction is stored here on the fact.

It contains the key of the dimensions, a date, and some numbers.

Usually the fact tables are very big that have a lot of rows, while the dimension on the other side, have more columns, less rows.
For Example — usually we don’t have millions of products or millions of customers but we have millions of sales.

So these are the main difference, the look up tables are the dimensions and the data tables are the facts.

Snowflake Schema —

Whenever we have dimensions connected to each other it is a snowflake schema. This is not the optimal form, the optimal structure is when we talk about dimensional model in Power BI.

So what we should have done, we should have joined these three tables together and build only one table which is the dimension product with the Category and SubCategory inside the same table as shown below —

Example where Dimensions are connected with each other — Dimension Product

So when we have dimesions connected to each other we should join them together, but there is a limit.

What we should not do ?
We should not take all these three tables here and put them inside the fact table.
Don’t do this.
We should join dimensions together into one single table but should never put the columns from the dimension inside the fact tables.

So there is a better way to solve this — joining these tables together and building what we call the Star Schema.

Star Schema —

Star Schema requires modelers to classify their model tables as either dimension or fact. The Star schema have the dimsnsions connected to the fact tables. Note — there is no dimension connect to another dimension.

Star Schema is one of the way to simplify a data model

Star Schema

These dimensions have everything like The Who, The When, The What, The Where. So the dimension tables have the descriptions for the sales.

Star Schema — Dimentions connected to the Fact Tables

So here is another example in Power BI with three dimensions and two fact tables. The fact tables can also be put below the dimension tables, by this we always know that the relationship go from the top to the bottom, from the dimensions to the fact. So we always try to build a dimensional model in a star schema like this. Always Always. there is no exception.

Relationship from Top to Botton

Tip — If you are stuck in a measure, if you cannot write it properly. Don’t forget to check your model first.
Problem might be on your model and not on the DAX measure. You must have always dimensions and facts.
And the dimensions connected to the facts through relationships.
There is an important requirement here which is all dimensions must have a unique column which is the primary key of the table.

Things to Remember —

  1. Power BI Data Modeling is the process of creating a relationship between common columns of multiple tables.
  2. We should join dimensions together into one single table but should never put the columns from the dimension inside the fact tables.
  3. Avoid many-to-many relationships as far as possible.
  4. Always have a date table in your data model.
  5. Always aim for 1:many relationships for each dimention tables to fact table.

--

--

SUMIT SHARMA
Geek Culture

Software Development Engineer, Stock Market Analyst, Fitness Coach, Video Editor, Freelancer