Database Engineering Part 9: Data Modelling

11 min readJan 20, 2024

Imagine you’re planning a party, and you want to keep everything organized to make it enjoyable for everyone. Data modelling, in simple terms, is like creating a detailed plan or checklist for your party.

Guest List (Entities): You start by making a list of guests you want to invite. Each guest is like a separate entity, with details like name, age, and contact information.
Relationships (Connections): Next, you figure out how the guests are connected. Who is friends with whom? Are there family members attending? This is similar to establishing relationships between different pieces of information.
Party Activities (Attributes): You plan the activities at the party — music, games, and food. These are like attributes of your party, defining different aspects of the event.
Avoiding Duplication (Redundancy): You want to ensure you’re not duplicating efforts or resources. For instance, you wouldn’t want to buy double the amount of food just because you recorded the guest list in two different places. Data modelling helps you avoid unnecessary duplication.
Easy Access (Efficiency): You organize the plan so that you can easily access information. For instance, you might group guests by age or preferences, just as data modelling organizes information in a way that makes sense for efficient use.
Visualizing the Plan (Diagrams): To make it clear for everyone involved, you might draw a diagram or create a visual representation of your party plan. Similarly, data modelling often involves creating visual representations, like charts or diagrams, to understand and communicate the structure of the information.

In summary, data modelling is like planning a party where you list guests (entities), establish connections between them (relationships), define various aspects of the party (attributes), avoid unnecessary duplication, and organize everything for easy access. It is majorly concerned with creating a thoughtful plan to ensure that your data (or in this case, your party) runs smoothly and everyone has a good time.

Principles and Techniques of Data Modeling

A. Entity-Relationship Diagrams (ERDs)

Entity-relationship diagrams (ERDs) are visual representations used in data modelling to illustrate the relationships between entities within a system or database. The primary purpose of ERDs is to provide a clear and concise overview of how different entities interact, helping stakeholders understand the structure and connections within a database.

Components of ERDs

Entities: Entities serve as representations of objects or concepts within the system undergoing data modelling. In an Entity-Relationship Diagram (ERD), these entities are commonly visualized as rectangles. For instance, in a library database, entities might be “Book,” “Author,” and “Publisher.”
Relationships: Relationships articulate the connections and associations between various entities, symbolized by lines within an ERD. Taking the library database as an example, a relationship could be established between “Book” and “Author” to illustrate the correlation between an author and the books they have written.
Attributes: Attributes outline the specific properties or features of entities, providing supplementary details. Typically illustrated within ovals connected to entities in an ERD, attributes offer insights into the characteristics of each entity. For the “Book” entity in the library database, attributes could encompass details such as “Title,” “ISBN,” and “Publication Date.”

Example of an ERD:

Consider a library database, that tracks Books, Authors and their Publishers.

1. Entities:

a. Book:

ISBN (Primary Key)
Title
AuthorID (Foreign Key)
Genre
Publication Year

b. Author:

AuthorID (Primary Key)
Author Name
Birthdate
Nationality

c. Publisher:

PublisherID (Primary Key)
Publisher Name
Location

2. Relationships

Each “Book” is authored by one or more “Authors,” establishing a Many-to-Many relationship. This is resolved by introducing an intermediary entity called “BookAuthor” that connects “Books” and “Authors.”
Each “Author” can be associated with multiple “Books,” and each “Book” can have multiple “Authors.”
Each “Book” is published by one “Publisher,” forming a One-to-Many relationship.
Each “Publisher” can have multiple published “Books.”

3. Attributes

a. Book Entity:

ISBN (Primary Key)
Title
Genre
Publication Year
PublisherID (Foreign Key)

b. Author Entity:

AuthorID (Primary Key)
Name
Date of Birth
Nationality

c. Publisher Entity:

PublisherID (Primary Key)
Publisher Name
Location

d. BookAuthor Entity:

AuthorID (Foreign Key)
ISBN (Foreign Key)

4. Visual Representation:

B. Normalization

Normalization is the process of organizing and structuring data in a database to eliminate redundancy and dependency, ensuring data integrity and minimizing the likelihood of data anomalies. The primary purpose of normalization is to design a database schema that reduces data redundancy, making it more efficient, less prone to errors, and adaptable to changes in business requirements.

Normal Forms

1NF (First Normal Form): Guarantees that every column in a table holds only indivisible values, with no presence of repeating groups or arrays.
2NF (Second Normal Form): Progressing from 1NF, it eradicates partial dependencies. This mandates that all non-key attributes exhibit full functional dependence on the entire primary key.
3NF (Third Normal Form): Extending the normalization procedure, 3NF eradicates indirect dependencies. It mandates that every attribute directly relies on the primary key, and no non-key attribute should be dependent on another non-key attribute.

How to apply Normalization in Database Design

Identify Entities and Relationships: Identify the database’s entities and relationships. For instance, in a library database, you might have entities like “Books,” “Authors,” and “Publishers.”
Define Primary Keys: Clearly define primary keys for each entity, representing a unique identifier for each record.
Analyze Dependencies: Examine the dependencies between attributes. Ensure that each attribute is functionally dependent on the primary key.
Apply Normalization Rules: Apply the rules of normalization (1NF, 2NF, 3NF) to restructure the database schema, breaking down tables and creating new ones to eliminate redundancy and dependency.
Introduce Additional Tables: Introduce additional tables to represent relationships between entities, avoiding data duplication.

Benefits and Challenges of Normalization

Benefits:

Data Integrity: Reduces data redundancy, minimizing the risk of inconsistencies and errors.
Efficient Storage: Optimizes storage space by eliminating duplicate data.
Scalability: Makes the database more adaptable to changes and future requirements.
Query Performance: Enhances query performance by simplifying data retrieval.

Challenges:

Complexity: Highly normalized databases can become complex, making them challenging to understand and manage.
Performance Impact: In certain scenarios, excessive normalization might lead to increased join operations, affecting query performance.
Application Overhead: Introducing and managing relationships between multiple tables can add overhead to the application code.

C. Schema Design

Schema design refers to the process of defining the structure and organization of a database, outlining how data is stored, accessed, and related. The primary purpose of schema design is to create a blueprint that facilitates effective data management, ensures data integrity, and aligns with the specific requirements of an organization.

Elements of Schema Design

Tables: Tables are the fundamental entities in a schema, representing distinct categories of information (e.g., “Books,” “Authors,” “Publishers”). Each table contains rows (records) and columns (attributes) that define the properties of the data contained in it.
Relationships: Relationships define connections between tables, indicating how data in one table is related to data in another. Common relationship types include One-to-One, One-to-Many, and Many-to-Many, shaping the interaction between tables.
Constraints: Constraints enforce rules on data within tables, ensuring data accuracy and integrity. Examples include primary keys (uniquely identifying each record), foreign keys (linking records across tables), and check constraints (specifying allowable values).

Role of Schema in Database Organization

Schema provides a logical and structured way to organize data within a database.
It serves as a container for tables and other database objects, defining their relationships and constraints.
Schema facilitates the separation of concerns, allowing different aspects of data (e.g., author information, and book details) to be organized independently.

Considerations for Effective Schema Design

Understand Business Requirements:

Collaborate with stakeholders to understand the specific data needs and objectives of the organization.
Ensure the schema aligns with the business processes and goals.

2. Normalization:

Apply normalization principles to reduce redundancy and dependency, enhancing data integrity.
Organize tables to conform to normal forms (e.g., 1NF, 2NF, 3NF) based on the characteristics of the data.

3. Performance Optimization:

Consider the performance implications of schema design, aiming for a balance between normalization and query efficiency.
Indexing and partitioning can be employed to enhance data retrieval speed.

4. Flexibility and Scalability:

Design the schema to accommodate changes in data requirements and support future scalability.
Anticipate potential growth and alterations in business processes.

5. Documentation:

Maintain thorough documentation for the schema, including table structures, relationships, and constraints.
Documentation aids in understanding, troubleshooting, and evolving the database over time.

Importance of Understanding Business Requirements

In data modelling, business requirements are about the distinct needs, objectives, and constraints set by an organization concerning its data. These requirements form the basis for creating a robust data model. Understanding business requirements guarantees that the resultant data model harmonizes with the organization’s objectives and helps a customized solution to fulfil its information requirements.

Collaboration with Stakeholders:

Involvement of Business Users: Actively involve business users in the data modelling process to gather insights into how data is utilized in daily operations. Business users bring domain knowledge, ensuring that the data model accurately represents the complexities of the business.
Communication Strategies: Employ effective communication strategies to bridge the gap between technical and non-technical stakeholders. Use plain language, visual aids, and interactive sessions to facilitate a shared understanding of data modelling goals and outcomes.

2. Alignment with Organizational Objectives:

Ensure that the data model aligns with the broader objectives and strategic goals of the organization.
Consider how data supports decision-making, business processes, and overall organizational efficiency.

Role of Business Requirements in Guiding Data Modeling Decisions

Defining Entities and Attributes: Business requirements guide the identification of entities and attributes, ensuring that the data model reflects the key elements relevant to the organization.
Determining Relationships: Understanding business processes helps define relationships between entities, mirroring real-world connections and dependencies.
Setting Constraints: Business requirements influence the establishment of constraints, such as unique identifiers and rules governing data integrity.
Optimizing for User Needs: By incorporating business requirements, the data model is optimized to cater to the specific needs of end-users, making it a practical and user-friendly tool.
Iterative Refinement: Business requirements provide a basis for iterative refinement, allowing the data model to evolve with changing organizational needs and technology advancements.

A comprehensive understanding of business requirements in data modelling ensures that the ensuing data model is not just technically proficient but also intimately synchronized with the distinct needs and goals of the organization. Successful collaboration with stakeholders, adherence to organizational objectives, and the pivotal role of business requirements in shaping data modelling decisions collectively contribute to the efficacy of the data modelling process.

Applying ERDs, Normalization, and Schema Design to Solve Real-world Problem

Scenario: Imagine a healthcare system that aims to optimize patient information and appointment management.

Application of ERDs:

Entities: Patient, Doctor, Appointment
Relationships: A Patient can have multiple Appointments, and each Appointment is associated with one Doctor.
Attributes: Patient details (ID, Name, DOB), Doctor details (ID, Name, Specialization), Appointment details (ID, Date, Time).
Normalization:

a. 1NF: Ensures atomic values, avoiding repeating groups (e.g., storing multiple appointments in a single field).

b. 2NF: Eliminates partial dependencies, ensuring attributes depend on the primary key (e.g., Patient details depend on PatientID).

c. 3NF: Removes transitive dependencies, ensuring each attribute directly relies on the primary key (e.g., Doctor’s Specialization depends on DoctorID).

5. Schema Design:

Introduces tables like Patients, Doctors, and Appointments.
Enforces constraints like primary keys and foreign keys to maintain data integrity.
Adapting Models Based on Evolving Business Requirements

Assuming that the healthcare system expands its services and introduces new features.

Evolution of ERDs:

New entities like “Medical Tests” and “Prescriptions” are introduced.
Relationships are established, such as Patients undergoing Medical Tests and Doctors prescribing Medications.

2. Adjustments in Normalization:

New tables are created, like “Tests” and “Medications,” to handle specific data.
Attributes are organized to conform to normalization principles.

3. Schema Flexibility:

The schema is modified to incorporate the changes, ensuring that existing data remains unaffected.
Considerations for scalability and performance are addressed during these schema adjustments.

4. Continuous Collaboration:

Stakeholders, including healthcare professionals and administrators, provide ongoing input to refine the data model.
Regular communication and feedback loops ensure the data model remains aligned with evolving business requirements.

By integrating principles such as ERDs, normalization, and schema design with real-world business requirements, organizations can develop robust and adaptable data models. The case study above illustrates how these principles are applied to solve practical problems and how models evolve in response to changing business needs, showcasing the dynamic nature of effective database design.

Iterative Nature of Data Modeling

The iterative nature of data modelling acknowledges that the initial design is seldom final. It emphasizes continuous improvement, responsiveness to feedback, and the ability to adapt to emerging challenges. By embracing this iterative approach, organizations can maintain data models that not only meet current requirements but also evolve in harmony with the dynamic nature of the business environment.

Continuous Refinement: Data modelling is an iterative process that involves continuous refinement throughout its lifecycle.

Feedback Loop: Regularly seek feedback from stakeholders, including end-users and domain experts, to refine and enhance the data model.
Adaptation to Insights: As understanding deepens and new insights emerge, the data model evolves to better represent the dynamics of the organization.

2. Flexibility to Accommodate Changes: An effective data model is designed with the flexibility to accommodate changes in business requirements.

Scalability: Anticipate growth and changes in data volume, ensuring the model can scale without significant redesign.
New Features: The model should be adaptable to incorporate new features, entities, or relationships as the business evolves.

3. Balancing Consistency and Adaptability: Striking a balance between consistency and adaptability is crucial for a sustainable data model.

Consistency: Ensure consistent application of principles like normalization and schema design for data integrity and reliability.
Adaptability: Allow for adjustments that cater to changing business needs without compromising the overall structure and coherence.

Challenges and Considerations

In navigating the challenges of data modelling, organizations can adopt strategic approaches to enhance precision, accommodate evolving needs, and leverage emerging trends in technology.

Common Challenges in Data Modeling:

Complexity of Business Processes: Data modelling becomes challenging when dealing with complex and elaborate business processes. These complexities may lead to difficulty in accurately representing relationships and dependencies.
Data Quality Issues: Ensuring data quality and consistency poses a persistent challenge. Inaccurate or inconsistent data can result in flawed models and unreliable insights.
Balancing Normalization and Performance: Striking a balance between normalization for data integrity and denormalization for performance can be challenging. Overemphasis on one aspect may compromise the other, affecting query performance.

Strategies for Overcoming Challenges:

Thorough Requirements Analysis: Conduct a comprehensive analysis of business requirements to ensure a clear understanding of data needs. Precise requirements reduce ambiguity and help design a more effective data model.
Collaboration with Stakeholders: Foster ongoing collaboration with stakeholders to gather continuous feedback. Regular input helps address evolving business needs and enhances the accuracy of the data model.
Use of Advanced Tools and Technologies: Leverage advanced data modelling tools and technologies to streamline the design process. Automation and advanced features can mitigate complexities and improve overall efficiency.

Data modelling is a dynamic process that requires a holistic approach, considering technical complexities, organizational needs, and the ever-evolving business landscape. By embracing an iterative and collaborative mindset, organizations can build data models that not only stand the test of time but also evolve to meet the challenges and opportunities of the future.

Successful design of a data model hinges on a harmonious interplay between principles (ERDs, normalization, schema design), techniques, and a comprehensive understanding of business requirements. The alignment of these elements ensures that the resulting data model not only meets technical standards but also fulfils the specific needs and objectives of the organization.

References

What’s Next?

In the next part of this series, we will explore the topic of database normalization. Our primary emphasis will be on normalization forms, encompassing 1NF, 2NF, 3NF, BCNF, and others. We will delve into the objectives of each normalization form, highlighting their significant role in reducing redundancy and enhancing the efficiency of database systems. Furthermore, we will tackle prevalent challenges linked to normalization and discuss the trade-offs inherent in striving for a finely tuned database structure.

Click here to read the previous article in this blog series.
Click here to read the next article in this blog series