Dimensional Data Model

Developing a dimensional model to represent sales transactions in a grocery store.

Taranjit Kaur
Women in Technology
4 min readJul 18, 2023

--

Let’s familiarize ourselves with some terminology before we dive into the instructions.

What is a Dimensional Data Model?

Dimensional Data Modeling is a data modeling technique utilized in the design of data warehouses. It was pioneered by Ralph Kimball and revolves around the utilization of facts and dimension tables. The primary objective of this modeling approach is to enhance data retrieval efficiency. The key advantage of adopting this model is the ability to store and retrieve data in a simplified manner within a data warehouse. Notably, the dimensional model serves as the preferred data model for numerous OLAP systems.

To grasp the concept of dimensional modeling, it is crucial to familiarize oneself with key elements such as fact tables, dimension tables, star schemas, and foreign and primary keys.

Facts refer to quantifiable data elements that capture the relevant business metrics. For instance, in a sales data warehouse, these facts could encompass sales revenue, units sold, and profit margins. Within a dimensional data model, the fact table serves as the central table housing the desired measures or metrics. It is accompanied by dimension tables that depict the attributes associated with these measures. The dimension tables establish relationships with the fact table through the utilization of foreign key connections.

Dimensions are the descriptive data elements utilized for classifying or categorizing the data. In the context of a sales data warehouse, dimensions may comprise product, customer, time, and location, among others. The dimension table specifies the dimensions of a fact and establishes a connection through a foreign key. They can have one or multiple relationships with other tables.

A primary key in a dimension table is a column that uniquely identifies a specific row of data. Foreign keys refer to these primary keys in order to establish relationships between fact and dimension tables through joining operations.

A foreign key, as the name suggests, is a key that refers to another table. Its purpose is to facilitate the joining of data from two tables, commonly a fact table and a dimension table.

A star-schema database structure is characterized by foreign-primary key relationships established between a fact table and its associated dimension tables. This structure resembles a star shape.

Design the Dimensional model

To illustrate the creation of our dimensional model, let’s consider a retail grocery store scenario. Our grocery store offers selected products through promotional activities. In our retail case study, the management seeks to enhance their understanding of customer purchases, which are recorded by the point-of-sale (POS) system at the cashier desk. Consequently, the business process being modeled revolves around the retail sales transactions conducted through the POS system. This valuable data empowers business users to delve into the intricate details of product sales, encompassing aspects such as store information, dates, promotional conditions, and transaction specifics.

Having outlined our business case study, we will now commence the process of crafting the dimensional model

Step 1: Select the Business Process:

What specific area of business operations are you focused on?

In our retail case study, the management aims to gain a deeper insight into customer transactions recorded by the Point of Sale (POS) system.

Step 2: Declare the Grain

Design and create dimensional models that depict the business process’s minutest, atomic information.

In our case study, we actively examine the finest level of detail by focusing on individual products within a POS transaction.

Step 3: Identify the Dimensions

Once the grain of the fact table has been determined, selecting the dimensions becomes a straightforward process.

In this case, the product and transaction are readily identifiable. Additional dimensions can be linked to the POS measurements, including the sale date, store location, promotional campaign associated with the product, cashier responsible for the sale, and possibly the payment method utilized.

Step 4: Identify the Facts

Which facts will appear in the fact table ?

The POS system gathers various pieces of information, including the quantity of sales , the regular price per unit, the discounted price per unit, the final net paid price per unit, as well as the total discount amount and total sales amount.

The retail sales star schema, example from [Kimball 02].(Diagram borrowed from researchgate.net)

In this article, you were introduced to the fundamentals of designing a dimensional model. Irrespective of the industry, we highly recommend following the four-step process for approaching dimensional model designs.It is crucial to explicitly define the level of detail associated with a dimensional schema, known as the grain. By loading the fact table with atomic data, you gain significant flexibility as the data can be summarized in various ways to meet your analytical needs.

Reference:

The Data Warehouse Toolkit, 3rd edition

You might be interest in this series where I’m introducing several important concepts that new Data Engineers should be aware of. The other topics I talked so far:

Distinctions Between CTEs, Subqueries, and Temporary Tables.

Replication Lag

Replication

Sharding and Partitioning

Partitioning Data

Optimizing data

Enhanced Query Performance

Indexing

Scalability

Slowly Changing Dimension

Thanks for the read. Do clap👏 and follow me if you find it useful😊.

“Keep learning and keep sharing knowledge.”

--

--