The craft of data modelling

Ravikiran durbha
Data And Beyond
Published in
3 min readAug 13, 2022

why it’s even more relevant today than before.

Data modelling as a role appears to be dying or so it has seemed since the advent of NoSQL, schema-on-read and Data Lakes. Organizations seem to be investing lesser on data modelers and more on data scientists and data engineers. There is a premium on insight and they can’t get it fast enough. There is a prevailing thought that data modeling evolved to reduce redundancy in storage and storage evolved to become very cheap, rendering data modelling economically irrelevant.

Data engineering promised to bring the best practices from software engineering to data but seems to have left behind the principles of data modeling. To be sure, the role of a data modeler has certainly changed but the craft is still valuable to an organization, especially when it is embraced by data engineering. Steve Hoberman, Author and data modeler extraordinaire, said “Data modeling is the process of learning about the data, and the data model is the end result of the data modeling process”. It would seem that it is important to learn the data before we can deliver insights from it.

It is true that Data modelling may seem incongruent to agility. After all, one has to understand all the rules governing the business process before defining the entities and their relationships that are part of this process. Albeit, if we consider that modelling is really an exercise of collaboration with business and not something that IT works in isolation to deliver to business, agility ceases to be a factor.

There are core tenets in data modelling that are invaluable to any modern data stack to deliver high quality insights. There maybe different data modelling patterns in practice like 3NF, Dimensional Models, Data Vault etc., but these principles are universal and are agnostic of underlying implementation.

  • Represent the real world — Model the data to look like the business. It is important to start on a clean slate, no matter how much we have written on it before. The real world is unique to every organization.
  • Model relationships and cardinality to represent business rules — A business transaction is the coming together of a subset of entities in the business process. Consider all possible permutations and then prune them based on constraints or (business rules) enforced by the process. This leads to a better understanding of the business process and its limitations, especially if we end up with certain permutations not pruned but considered illegitimate by the business.
  • Abstract for flexibility — The model should be specific only if the business process demands it. For example, an employee can also be a customer unless the process explicitly prohibits it. So maybe an abstract entity in this case could be Person.
  • Specialize when roles or states are important — Only when the specificity of a role or state is important, we specialize it. Maybe it is important for the process to know if an employee was indeed a customer. In this case we have specialized entities — “Employee” and “Customer”.

Using these few holistic principles to persist data can yield a lot of benefits:

  • Seamless interaction between business and IT with the data model as the basis.
  • Clear foundation for measuring data quality based on the business rules.
  • Better data governance of attributes through relationships and hierarchy.
  • Pattern based automation of data movement and transformations, leading to faster delivery of insights.
  • Meaningful insights based on business context.

While an organization may not invest in a specialized role of a full time data modeler, it is important for a data practitioner to invest in the craft of data modelling to fulfill the vision of a “data-driven organization”.

--

--