Do we really need data modeling in the world of the modern data stack?

Nikolay Golov
Manychat Tech Blog
Published in
3 min readJun 8, 2022

I have 15 years of experience as a data engineer. In the beginning, I worked with classical OLTP databases, like MS SQL Server, PostgreSQL, and Oracle. At that time, everybody wanted to get a classical data warehouse, according to the data modeling methodology by Kimball or Inmon.

Later “big data” emerged, with MPP databases (Teradata, Vertica) and tools for boundless data (Hadoop, MongoDB). Clients started to talk about data lakes. I successfully implemented a few data platforms, which tended to be the normalized data lakes — data warehouses based on MPP databases, able to rapidly grow and evolve because of the usage of the modern agile data modeling methodologies, like Data Vault or Anchor Modeling. Technically speaking, those data warehouses were not data lakes, therefore they didn’t possess many of the disadvantages of data lakes, like the risk of turning into a data swamp. But they were still extremely agile and flexible for growth.

To be honest, I’m biased. I just like data warehouses and try to build them for every task, using the best tools and methodologies available.

Currently, we have a modern data stack (you can Google the term). Simply speaking, this is a set of advanced cloud-based, boundless SQL-based tools for data ingestions, data transformation, data processing, and data visualization, for example, Airbyte+DBT+Snowflake+Looker. These tools give you the ability to build a lakehouse, a combination of a data lake and a data warehouse.

Do we need to use data modeling in this case? Since I’m biased toward the benefits of data modeling, let’s assume the step of data modeling is completely avoided when building an analytical platform. Will it work?

Based on my experience, I think all data engineering issues in building an analytical platform can be solved without data modeling, using just the capabilities of the modern data stack. There will be logical and physical duplicates of the data, but it’s somewhere in the cloud, who cares?

But, and there is always a “but”, what about analysts? Analytical platforms are being built to find data-driven insights, not for data engineers. Imagine a data analyst, who needs to start working with an analytical platform. He needs to discover available data, understand its structure, dependencies, and ways of combining data from the different sources into a meaningful data set.

You might think I’m describing tasks of a data catalog. And you are absolutely right. In my experience, I’ve seen a lot of projects where companies tried to understand their data by adopting some type of data catalogs, when nobody understands what data is available, of what quality, and so on. Also, I’ve participated in a few projects where a sort of data catalog was launched at the beginning, as a part of a data modeling step. Such companies never had issues with understanding their data.

So to sum up: data modeling techniques (dimensional modeling, data vault, anchor modeling) were originally created to solve a set of problems. Most of those problems are not problems anymore: they can be solved out of the box by the modern data stack. But the problem of analyzing your data, its structure, its quality, and its dependencies, from an analytical point of view, is still relevant. Data modeling techniques can help here, despite the fact they were created for slightly different tasks.

Here are two questions to consider:

  1. Are there any benefits of data modeling techniques in the world of the modern data stack, besides the necessity to explore your data?
  2. If you need to just explore your data, which approach from the classical data modeling techniques (dimensional modeling, data vault, anchor modeling) is better? Maybe we need another one that will be specifically created for this task?

--

--

Nikolay Golov
Manychat Tech Blog

A Head of Data Platform of ManyChat. Data Engineer. Researcher of Data Modeling techniques (Anchor Modeling, Data Vault). Lecturer of Harbour Space University.