Fact & Dimension Tables: What are they and how can we use them

Datawithzon
4 min readOct 28, 2023

--

Data can get very messy. Especially when everything is consolidated into a single dataset. That is why I try my best to categorize my data into two different categories, Fact and Dimension tables.

Fact and dimension tables are used by Data Engineers whenever they are formulating schema for a database, but they can be useful information to a data analyst because:

  1. It allows you to have a clearer picture of your data. This will result in understanding whether you have the necessary data points to provide the necessary insights that you desire
  2. With visualization tools such as Power BI and Tableau allowing us to build relationships within the software itself, it has provided us the luxury of having different kinds of data stored in different datasets, while still being able to show the same information as compared to having a single dataset

These are just some of the many benefits that you have when categorizing your datasets, but let me dive a little deeper into what Fact and dimension tables are.

What are Fact Tables

Sales Table

Data in fact tables are usually data points that give you the key insights/business questions that you are looking for. They usually contain the primary keys of Dimension tables, and also the key data points that you require to derive your insights. It is also possible to have multiple data points to derive different key insights.

Some examples of fact tables are:

  1. Sales Transaction data (key data point) of an e-commerce company
  2. User login data (key data point) of a hosting server platform
  3. Button Interaction Data (key data point) of a website

What are Dimension Tables

Product & Customer Attribute Table

Data in Dimension tables are usually data points that describe the attributes of an “object”. They usually have a primary key in each row, so that these attributes can be retrieved whenever necessary (in the case above, product_id and customer_id is the primary key of their respective tables).

Some examples of a dimension table are:

  1. Attributes of a product (object) sold by the company
  2. Attributes of customers (object) buying from the company
  3. Attributes of employees (object) working for the company

How can we use Fact and dimension Tables together?

Let’s imagine your boss has tasked you to identify the number of office products that have been sold in 2023, and he has provided you ONLY with the sales dataset below. Would you be able to give him an answer?

Sales Table

You’ll be able to give him the total number of products sold in 2023, but not the number of office products sold in 2023. This is because you are unable to identify within the table, which items are categorized as office.

Now, if he had provided you with an additional products table found below

Products Table

we can give him the insight that he is looking for because now we have the ability to map each of the items in the sales data to a category. Using the product_id from both the sales and products table, we will then be able to get a dataset like this.

We can then confidently answer our boss that the total number of office products sold is 13.

The example I’ve shown above is a good example of how fact and dimension tables work together to derive key insights that can be suited to specific needs.

Fact tables can give us an overview insight, but it can be challenging to segment it based on specific needs. By mapping primary keys from the dimension tables to the fact tables, you will then have the ability to derive overview, and specific insights whenever required. The best of both worlds.

To summarize my point above, we can look at data like a TV series. The fact table is like the “main character” of the series, where most of the attention (insights) will be centered around him. The dimension table is like the “supporting characters” of the series, where it supports the attention (attributes) given to the main character

I would love to hear your thoughts on the Facts and Dimension tables. Do you think it is worth splitting them up into separate tables, or would it be better off be in a single dataset?

If you do enjoy my content, be sure to follow me on Instagram as well, where I do content similar to this.

--

--