Data Modeling Fundamentals

Seckin Dinc
8 min readMay 8, 2023
Photo by Gabriel Sollmann on Unsplash

In the world of data product development, data modeling is a crucial step that helps organizations turn raw data into valuable insights. Data modeling is the process of creating a conceptual representation of data that enables developers to design and build a database structure that can efficiently store and retrieve data. The importance of data modeling lies in its ability to help organizations understand the relationships between different data elements, which in turn helps them make better decisions, improve operations, and achieve business objectives. Without proper data modeling, data products can become convoluted and difficult to use, leading to incorrect insights and lost opportunities. In this article, we will take a technical deep dive into data modeling and explore its importance in the context of data product development.

Entity, Attribute, and Relationship

Photo by charlesdeluvio on Unsplash

Together, entities, attributes, and relationships form the foundation of data modeling, which is creating a conceptual, logical, or physical representation of data for a particular purpose or context.

Entity

In data modeling, an entity is a real-world object or concept that can be uniquely identified and described, such as a person, place, thing, event, or concept. An entity is typically represented by a table in a relational database, and each instance of the entity is represented by a row in the table.

Attribute

An attribute is a characteristic or property of an entity, such as a name, age, address, or color. Attributes are represented by columns in the table that represents the entity.

Relationship

A relationship is a connection or association between two or more entities. Relationships describe how entities interact or relate to each other, and are represented by lines or connectors between the entities in a data model. Relationships can be one-to-one, one-to-many, or many-to-many.

In a one-to-one relationship, each instance of an entity is associated with exactly one instance of another entity, and vice versa. For example, in a database of employees and their offices, each employee may be associated with one office, and each office may be associated with one employee.

In a one-to-many relationship, an instance of an entity is associated with one or more instances of another entity, but each instance of the other entity is associated with only one instance of the first entity. For example, in a database of customers and their orders, each customer may have multiple orders, but each order is associated with only one customer.

In a many-to-many relationship, each instance of an entity can be associated with one or more instances of another entity, and vice versa. For example, in a database of students and courses, each student may be enrolled in multiple courses, and each course may have multiple students enrolled in it.

Types of Data Modeling

Photo by Kelly Sikkema on Unsplash

Conceptual Data Model

The conceptual data model is a high-level representation of the data entities and their relationships. This type of data model is focused on business concepts and is independent of any specific database technology. Conceptual data models are usually created by business analysts or data architects and serve as a starting point for more detailed models. The main purpose of a conceptual data model is to provide a clear understanding of the scope of the data, the relationships between entities, and the business rules that apply to the data.

Image by the author

Logical Data Model

The logical data model is a more detailed representation of the data than the conceptual model. It describes the relationships between data entities and is often used to map the data requirements to a specific database technology. Logical data models are created by data modelers or database designers and are used to design the database schema. The main purpose of a logical data model is to define the structure of the data and ensure that the database is normalized and efficient.

It is important to understand that logical data models are independent of any specific application and implementation of the data. In the same regard, the logical data model is not limited to any database tool or vendor.

The logical data model only changes if any change occurs in the business data requirements. It doesn’t change according to the application changes as it is independent of any application.

Image by the author

Physical Data Model

The physical data model is a low-level representation of the data that defines how the data will be stored in a specific database technology. It includes details such as table structures, data types, and constraints. Physical data models are created by database administrators or developers and are used to implement the database schema. The main purpose of a physical data model is to ensure that the database is designed to be efficient and performant.

The physical data model is the mapping of a subset of the logical data model which is validated and suitable by the selected architecture and technology. That said, the physical data model should be compatible with the business rules, definitions, and relationships.

Image by the author

Relational vs Dimensional Data Modeling

Relational data modeling

In a relational data model, data is organized into tables, with each table representing a specific entity or object. Each table consists of rows (also called records or tuples) and columns (also called fields or attributes). The columns represent specific pieces of information about the entity, such as a customer’s name, address, and phone number.

Tables in a relational data model are related to each other through common fields, which are called keys. A key is a field that uniquely identifies each row in a table. By using keys, data in one table can be linked to data in another table, allowing for complex queries and data analysis.

Dimensional data modeling

The dimensional data model is a way of structuring data to support analytical processing and reporting. It is designed to provide quick and efficient access to data for analysis, rather than for transaction processing.

In a dimensional data model, data is organized into two types of tables: fact tables and dimension tables. Fact tables contain numerical data, called measures, that represent business metrics such as sales or revenue. Dimension tables contain descriptive data, called dimensions, that provide context to the measures in the fact table, such as time, location, or product.

Differences between the two models

Here are the key differences between the two;

  1. Purpose: Relational modeling is designed to represent transactional data and focuses on relationships between entities. Dimensional modeling, on the other hand, is designed to represent analytical data and focuses on measuring business metrics across different dimensions.
  2. Structure: Relational models are based on the normalization process, which involves breaking down data into smaller, more manageable units to minimize redundancy and improve data integrity. Dimensional models, on the other hand, are based on the concept of a star schema or snowflake schema, which involve denormalizing data to optimize for query performance.
  3. Data Types: Relational models are designed to represent structured data, such as numbers and text. Dimensional models, on the other hand, are designed to represent semi-structured and unstructured data, such as time-series data, text, and images.
  4. Query Optimization: Relational models are optimized for transactional processing, where the focus is on quickly accessing individual records. Dimensional models, on the other hand, are optimized for analytical processing, where the focus is on aggregating data across dimensions to analyze business metrics.

Data Modeling Notations

Photo by Fatih Erkan Akay on Unsplash

Data modeling notations are graphical representations used to create data models. Data models are used to represent the structure of data within a system or organization, and data modeling notations are used to help design and communicate those structures.

There are several data modeling notations, including;

  1. Entity-Relationship (ER) Notation: ER notation is used to create data models that represent entities (objects) and the relationships between them. ER diagrams consist of entities, attributes, and relationships, and are commonly used in database design. ER diagrams are the most known and used notations in the data domain.
  2. Unified Modeling Language (UML): UML is a general-purpose modeling language that can be used to model a wide range of systems, including software systems and business processes. UML diagrams can represent classes, objects, use cases, activities, and more.
  3. Object-Oriented Data Modeling (OODM): OODM is a modeling technique that is used to represent data in an object-oriented way. OODM diagrams consist of objects, classes, attributes, and methods, and are commonly used in software design.
  4. Data Flow Diagrams (DFD): DFDs are used to model the flow of data within a system or process. DFDs consist of processes, data stores, and data flows, and are commonly used in system analysis and design.
  5. Business Process Modeling Notation (BPMN): BPMN is a notation used to model business processes. BPMN diagrams consist of activities, gateways, events, and flows, and are commonly used in business process management.

Each notation has its strengths and weaknesses and is suitable for different types of data modeling tasks. It is important to select the appropriate notation based on the specific needs and requirements of the project.

Conclusion

Understanding the fundamentals of data modeling is critical for anyone working with data in today’s business environment. Data modeling is the process of structuring data to support business requirements and is essential for creating accurate, efficient, and reliable data systems.

By learning the basics of data modeling, including relational and dimensional models, individuals can better design, develop, and maintain data systems that meet the needs of their organization. With the rapid growth of data in today’s world, the importance of strong data modeling skills will only continue to increase.

--

--

Seckin Dinc

Building successful data teams to develop great data products