Designing a Data Model

Modelling a business into entities and creating relationships. A data engineering guide

Abubakar Alaro
Geek Culture
4 min readFeb 25, 2022

--

Outline

  1. Introduction
  2. What is a Data Model
  3. Advantages of a data model
  4. Steps to designing a data model: An Example
  5. Conclusion

Introduction

Proper organization of files ensures quick and easy recovery. Without organizing our files, lot of time is wasted to answer questions based on the file. A common term to represent data organization in the data industry is data model.

What is a Data Model

According to Wikipedia, a data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. Here, entities are things that we can relate with and they form an integral part of our business e.g Customers, Product, Employee, Inventory etc. These entities have their separate fields or attributes which defines the kind of data that can be stored in them. For example, entity Customer can have attributes like customer_id, first_name, last_name, age etc.

Data modeling is the process of creating data model for an information system (Database) to represent a business case by going through certain stages with the goal of producing quality, consistent, and structured data for operating business applications.

The stages mentioned above are commonly grouped into 3 stage which are;

  • Conceptual Stage
  • Logical Stage
  • Physical Stage

Conceptual Stage: I call this stage the talking stage where an engineer talks to different stakeholders that will use the database to be created and tries to understand the following;

  • data source that are currently available
  • business questions that the database will be use to answer
  • data size
  • how the database will be used
  • and any other questions that might help understand the business case better.

Logical Stage: This is when the engineer starts building out the data model in a logical fashion. Here, the engineer will model each entity, it attributes and form any relationship that might be neccessary to answer the business questions. An Entity Relationship diagram (ERD) is usually used to show the entities and how they relate with one another.

Physical Stage: Here the engineer designs the internal schema of the database, the data tables, the data columns of those tables, and the relationships between the tables.

The logical and physical stage sounds very similar but the major difference is their uses and the level of detail that each explains. The result of the logical stage can be used to explore concepts with stakeholders or business owners while the result of the physical stage is usually the database design or the infrastructure set up where the database flavor has been selected and the database size is specified.

Advantages of a data model

  • It ensures that the business data is consistent, of high quality and standard
  • It allows businesses to track back and improve
  • It can be leveraged to answer specific business questions
  • It reduce cost and allow for proper operational optimization

Steps to designing a data model: An Example

I will be using a Poultry business as my case study. To design the data model I will start by asking relevant questions to help me understand the use case better. Questions like;

  • What will the database be used for
  • What are some business questions the database will be used to answer
  • How big is the data size to anticipate
  • What are some of the available data
  • Who would use the database

Based on the answers to the questions above, I have the following entities; Customers, Products, Sales, Cages, Employee, DailyEggCount. To define the entity and its attributes, I used the set notation like this;

The next step is to build an ER diagram that will show how these entities relates. Here, I defined how relationships are formed between entities and the type of the relationship that exists between them. There are generally 3 types of relationships which include;

  • One — to — One relationship
  • One — to — Many or Many — to — One relationship
  • Many — to — Many relationship

I will not be discussing the listed relationships but I will advice you read here or here to learn more about them.

My ER diagram looked like this;

ER Diagram

The last stage of the example here is to build out the database design otherwise known as the physical stage. I used a MySQL database hosted on AWS. I wrote sql code to materialize the database as well as load data into the database. The sql code is attached below

The github gist above creates the tables (entities) into the database, forms the relationships between those entities and specifies the integrity.

Conclusion

Structuring data into a form that makes it consistent and conform with the standard is important for every business. Data modelling is one way of doing this and considered an important skill for a data engineer because it helps to understand the data and the business better and also help the business to grow and expand.

The stages of data modelling provides unique insights into the business and the understanding of the use case. Thanks for reading

--

--