Data Modeling: Streamlining the Data Flow

5 min readAug 6, 2023

Previous blog : Data Preprocessing : The Foundation of Effective Data Analysis

Data Modeling:

Data modeling is the process of creating a visual representation of the data and its relationships in a structured manner. It involves defining the structure, constraints, and relationships of the data elements to ensure efficient storage, retrieval, and analysis of the data.

Why Data Modeling is Necessary?

Data modeling is necessary for several reasons:

Organizing Data: It helps organize complex data into a structured and logical format, making it easier to understand and manage.
Ensuring Data Integrity: Data modeling establishes relationships and constraints, ensuring data integrity and consistency.
Optimizing Performance: A well-designed data model improves data retrieval and query performance.
Facilitating Communication: Data models serve as a common visual representation that facilitates communication between stakeholders.
Supports Decision Making: Data models provide insights into data relationships, enabling better decision-making and analysis.

Schema in Data Modeling:

In data modeling, a schema is a logical and structural blueprint that defines the organization of a database. It represents the arrangement of tables, attributes, and the relationships between data elements. A well-designed schema ensures data consistency, integrity, and efficient storage and retrieval of information. Different types of schemas exist, each serving specific purposes based on the organization’s requirements and the nature of data.

Types of Schemas:

Star Schema: It features a central fact table connected to multiple dimension tables. Simplifies querying and enables fast data aggregation, suitable for OLAP systems.
Snowflake Schema: An extension of the star schema with normalized dimension tables, reducing data redundancy but may lead to more complex and slower queries.
Fact Constellation: Involves multiple fact tables connected to various dimension tables, suitable for capturing diverse business metrics and relationships.

Dimension and Fact Tables:

Dimension Tables (Dim): Dimension tables in data modeling store descriptive attributes that provide context to the data in the fact table. They act as a reference for the fact table’s numerical measures. Dimension tables contain primary keys and descriptive attributes like names, categories, and dates.
Fact Tables: Fact tables contain quantitative measures, metrics, or numerical data that represent the core business processes. These measures are associated with foreign keys that link to the primary keys in dimension tables, creating relationships between the two.

Data Modeling Steps in our project:

In our project, we utilized Python to create Dimension and Fact tables for two categories: “Supplier” and “Retailer.”

For the “Supplier” category, we generated the following tables:

1. Fact_Sup_Order: Contains order details.

2. Dim_Supplier: Stores supplier details, with the primary key being the supplier code.

3. Dim_sup_Products: Holds product details, with the primary key being the product code.

For the “Retailer” category, we created the following tables:

1. Fact_order1: Contains order details.

2. Dim_retailer: Stores retailer details, with the primary key being the retailer code.

3. Dim_Products: Holds product details, with the primary key being the product code.

4. Dim_tax: Stores tax details, with the primary key being the product code.

5. Date: Contains date details.

Dimension table for Retailer Product details

With two fact tables, our schema can be identified as a fact constellation schema.

Relationships:

In Power BI, we established relationships between the “Fact_Order1” table for Retailer and several dimension tables using foreign keys, as follows:

1. “Fact_Order1” table connected to the “Dim_product” dimension table through the “Product Code” column, forming a one-to-many relationship.

2. “Fact_Order1” table connected to the “Dim_retailer” dimension table through the “Retailer Code” column, also forming a one-to-many relationship.

3. “Fact_Order1” table connected to the “Dim_tax” dimension table through the “Product Code” column, creating another one-to-many relationship.

4. “Fact_Order1” table linked to the “Date” dimension table through the “Invoice Date” column, forming a one-to-many relationship.

Date = CALENDAR(MIN(fact_order1[ Invoice Date]),MAX(fact_order1[ Invoice Date]))

In the Supplier category, we established relationships between the “Fact_Sup_Order” table and two dimension tables using foreign keys:

1. “Fact_Sup_Order” table connected to the “Dim_Supplier” dimension table through the “Supplier Code” column, forming a one-to-many relationship.

2. “Fact_Sup_Order” table connected to the “Dim_Sup_Product” dimension table through the “Product Code” column, also forming a one-to-many relationship.

The “Fact_Sup_Order” table linked to the “Fact_Order1” table through the “Product Code” column, establishing a many-to-many relationship, indicating that multiple products were bought from suppliers and sold to retailers.

Cross filter direction:

Cross filter direction is set to “Single” for all relationships between fact and dimension tables, ensuring that data flows only from the dimension table to the fact table.

For relationships between fact tables (e.g., “Fact_Order1” to another fact table), we set the cross filter direction to “Both.” This bidirectional relationship allows us to analyze data between two fact tables effectively. This means that filtering data in one fact table can impact the data displayed in the other fact table, and vice versa.

Next blog : Dashboards : Visualizing the data