The Importance of Data Modeling

Data Modeling is one of the basic skills that a Data Architect must have.

Celman Elden D. Sudaria
ATCP Spark
Published in
7 min readNov 28, 2019

--

I first learned about data modeling in 2007 back when I was a Data Architect in Accenture Philippine Delivery Center (now called Accenture Advanced Technology Center in the Philippines). My first data model was for an internal application that processes all expenses incurred by employees when they go on global transfers or travel to work with clients.

Photo from ATCP Data Studio

Me and my team used the PowerDesigner data modeling tool (which is now part of SAP).

Since that time, I have been involved in several projects in my career. Some required me to explicitly do data modeling; some do not. Regardless, what I realized is that data is a very critical component of all enterprise and business applications. And when the data structure in these applications are not designed correctly, there is a fundamental impact to how the application will perform for the business users.

That is why, I believe that data modeling is a very important skill. Especially in the age of Big Data and Artificial Intelligence (A.I.), a basic and practical understanding of data models is absolutely critical for any ‘data practitioner’.

What is Data Modeling?

There are several definitions for Data Modeling. The one I like is the description by Steve Hoberman. According to him, “data modeling is the process of learning about the data, and the data model is the end result of the data modeling process.” He also described data modeling as “feeling the data”.

So, what is a data model?

Before we define what is a data model, let’s start with understanding models and why we use them.

What is a model?

Most people like to use models to simplify or communicate complex ideas, concepts or things.

According to the book titled The Change Book by Mikael Krogerus and Roman Tschappeler, they said that “we tend to perceive things first in images, then in words. We remember pictures better than text and are more likely to recognize patterns in images than in sentences.

Examples of models include: the double-helix model of our DNA, a schematic of how chemicals gets transported to our atmosphere and storm paths.

So aligned to the concept above, a data model is a simplification or abstraction of a complex idea, concept or thing. It is a “drawing” of boxes and lines to show important entities and how they relate to one another.

According to Hoberman (2009), “A data model is a “wayfinding tool” for both business and IT professionals, which uses a set of symbols and text to precisely explain a subset of real information to improve communication within the organization and thereby lead to a more flexible and stable application environment.

The Data Modeling Frameworks

Here are three examples of data models and their basic construct.

The first one is the Entity-Relationship (ER) Data Model. It’s basic elements are entities, attributes and relationships. These elements are illustrated below.

The Entity-Relationship (ER) model is one of the basic data model used in designing databases. It is most often used in designing databases for Online Transaction Processing (OLTP) applications or software. The most oft form of ER data model is in “3rd normal form” (read about normalization).

The second data model that is popularly used is the Dimensional Data Model. Its basic constructs are facts and dimensions. This data model is heavily used in Online Analytical Processing (OLAP) applications or in designing data warehouses. This is also popularly referred to as “star schema” due to its appearance (like a star) as shown in the illustration below.

Image Source: https://it.wikipedia.org/wiki/Schema_a_stella#/media/File:Star-schema.png

The dimensional data model allows easy “slice-and-dice” of the data. For example, with a dimensional data model, one can aggregate sales by branch, by product category and by month.

The last data model we want to discuss is the Data Vault Data Model (at least in this article). Its basic constructs are hubs, links and satellites. Its creator (Dan Linstedt) described it as a “hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema”.

It has a design that can adapt to changes because the entity relationships are in the row records in the links and satellites tables making it a lot agile and flexible compared to relationships defined in the schema in an ER data model.

Image Source: https://www.wikiwand.com/en/Data_vault_modeling

The data vault data model enables storage and integration of historical data from different source systems with tracing in place to track source system information. You can see from the image above that the hub, link and satellite all keep the [Record_Source] and [Load_Date] attributes.

Data Modeling is a timeless skill

During the early 2000s and when Big Data, Agile and NoSQL DBs were considered new, some data practitioners thought that data modeling is not needed — more often because we are ingesting data “as-is” as part of Big Data and NoSQL thinking & approach and most people believe data modeling is just too tedious and time-consuming. So, for some people, it seems that there is no need for data modeling or its just an unnecessary task.

This is far from the truth. Data modeling is actually very critical for Big Data, Analytics, or A.I. projects to be successful. This is because once the data is ingested into a data lake or a data hub, you still need to be able to organize, understand & make sense of the data and be able to communicate it to others. And there is no better way to do that than data modeling. That is why, most data lake or data hub architecture I see have a metadata capture and “Serving layer” — it is the stage/layer where they define the schema (think data model) specific to a use case of how data will be consumed. That’s why they call it schema-on-read.

As you can see, you still have to use data modeling to “define the schema” on read… and knowing what data model to use is key.

The most appealing attributes of data modeling to me are that it is technology-agnostic, dynamic and it is a timeless skill.

You can design the data model without any dependency on the application — essentially decoupling it from being applicable to only one application and consequently, allowing it to be reusable for other applications.

I hope this blog helps you in gaining a basic understanding of what is data modeling and what is a data model… and more importantly, its importance in the age of Analytics and Artificial Intelligence (A.I.).

Disclaimer: All views expressed on this story are my own and do not represent the opinions and viewpoints of any entity or organization that I have been, am now, or will be affiliated.

This story has been published for information and illustrative purposes only and is not intended to serve as advice of any nature whatsoever. The information contained and the references made in this story is in good faith, neither my employer nor its any of its directors, agents or employees give any warranty of accuracy (whether expressed or implied), nor accepts any liability as a result of reliance upon the information including (but not limited) content advice, statement or opinion contained in this paper.

This story also contains certain information available in public domain, created and maintained by private and public organizations. I do not control nor guarantee the accuracy, relevance, timeliness or completeness of such information. This story constitutes a view as on the date of publication and is subject to change.

This story makes only a descriptive reference to trademarks that may be owned by others. The use of such trademarks herein is not an assertion of ownership of such trademarks by me or my employer nor is there any claim made to these trademarks and is not intended to represent or imply the existence of an association between me and the lawful owners of such trademarks.

--

--

Celman Elden D. Sudaria
ATCP Spark

A Data Architect with over 20 years of experience in Data Architecture, Data Management & Data Engineering. https://ph.linkedin.com/in/celmaneldendsudaria