Data Modeling — STAR schema, Fact, and Dimension Tables

Soham Shinde
5 min readDec 3, 2022

--

Hello Readers, This blog is related to data modeling basics. We will be looking at some interesting data modeling techniques and how to build relationships between different tables. This blog will help while building data models and ETL processes.

We will be discussing the following topics:

  1. Important terms in the Data model
  2. What is modeling in Data?
  3. What is Star Schema?
  4. Why Data Modeling is important?

Let's start by discussing are important terms:

1. Important terms

1.1. Primary key: This is the column name that appears only once in the table row. The values in this column cannot be repeated because these are unique in their own context.

Examples: Roll number of students, Employee name in companies, Product ID or Product name, Product Type, and Type of furniture. These all attributes will only appear once in the row and will be unique. Now let's see what is the meaning of foreign key.

1.2. Foreign key: Foreign key is defined as an attribute that is in the table but linked to the primary key in another table. If we have a products table and orders table. In the products table, we will have unique products and the orders table can have orders received for the whole month. Suppose a relationship exists between the PRODUCT_ID (Primary key) column in the products table and PRODUCT_ID (Foreign key) column in the orders table, then the PRODUCT_ID in the orders table is a foreign key. This is because the orders table can have MULTIPLE products. The same product is ordered by multiple times, thus we can have multiple rows with the same product_id in the orders table.

1.3. Fact table: The fact table is the main table that has the primary key in it. This table is used in the STAR schema technique. The fact table has a primary key and the fact table is related to a number of dimension tables. STAR technique is the technique that has a fact table in the middle and there can be multiple dimension tables around it. Please see the corresponding diagram of the star technique.

1.4. Dimension table: In the above STAR technique we can see that DimReseller, DimDate, DimEmployee, and all others starting with Dim are dimension tables. These tables are linked with the fact table with a foreign key. The foreign appears in the fact table as the primary key as well. In the STAR schema, there are many dimension tables and only one fact table.

1.5. One to One Relationship: This relationship means that every record in one table is associated with exactly one record in another table. Examples of one-to-one relationships can be: Capital city — Country, Employee — Manager, Email — User account

1.6. One-to-Many Relationship: In this type of relationship we have records from one table related to many records in the other entity or table. This is the most common type that we use in the data modeling field. In this relationship, there can be more foreign keys in the dimension table but only one primary key in the fact table. One-To-Many relationship is used in the STAR schema most of the time.

2. What is modeling in data?

Data modeling is the technique to note the information in simple terms to have better communication between the teams. This is one of the most used and most applied concepts that we have to perform before doing an analysis. In data modeling, we define the relationship using a primary and foreign key.

We can think of data modeling as a complex model of information. For example, maps are the complex form of landscapes drawn on paper to understand in a simple form. Blueprints are simple drawings of architectural models. Similarly, the data model is a simple but understandable form of information.

3. What is STAR schema?

What is STAR schema in data modeling? This is the most important and useful technique when it comes to getting data and tables ready for analysis. There are lots of modeling tools that can be used, such as Power BI, Excel, and Power Query. But I use most of the times Power BI to make the relationships and connect tables. The advantage of the techniques is that we have an overview of the connection in one overview when we look at the data model. The disadvantage of this schema is that it can be only used in structured data. Let's see in detail what does STAR schema is:

STAR schema is a data modeling technique that enables to have a map of information or data in a very effective manner. Suppose, we want to create a relationship between different tables, we use this schema. The main advantage of this schema is to have one entity in the center and an’N’ number of entities in the background. These entities surrounding the center table are called the dimension table and the center table is called the fact table. Fact tables are the main tables having the primary keys.

It is important to have the primary key in the fact table. The primary key cannot be deleted from the fact table, because it has some relationship with the foreign key in the dimension table. I have already explained the STAR diagram but let's see how the fact table and dimension table look in the diagram form.

Figure: STAR Schema Diagram

https://medium.com/@nimanthaF/data-modelling-techniques-star-schema-f1077a1cced7

4. Why Data Modeling is important?

These models are important in terms of communication between the teams. In any company or project, there can be different types of professionals working such as data administrators, data analysts, business analysts, managers, developers, deployment teams, and many more. There has to be an understanding of each team in order to communicate the data terms and project outcomes in a simple form. In this case, data models make it easy to communicate vital information in a simple form. There are tools available in today's era which allow building data models without programming.

To wrap up we look into the terms related to data modeling, what is data modeling, STAR schema, and the importance of data modeling. In simple terms data model is similar to a map of information that tells users about the data's relationship with each other.

Thank you for reading,

Soham Sanjay Shinde

IMPORTANT NOTE:
Hope you like this story. Please follow me for more content. Please be aware that these are my thoughts and anyone copying this content will be reported and blocked by me. Also, will be taken appropriate legal action for using this story.

--

--

Soham Shinde

Applying Machine Learning in Supply Chain. Senior Data Analyst @ DHL Express. MS in Supply Chain Optimization