Is data modeling dead?

Tony Yan
5 min readJul 2, 2022

Recently I read a blog article from Chad Sanderson on data modeling: The Death of Data Modeling — Pt. 1 — by Chad Sanderson (substack.com) .

In this article, Chad discussed what is data modeling and what’s the current problem of data modeling in the modern data stack. So is data modeling dead or do we need a new way to do data modeling?

Traditional data modeling is dead

Traditional data modeling process is as follows:

Image is from Data modeling — Wikipedia

The data modeling process is a typical Waterfall model. Business teams and data teams work together for a business intelligence project. Business teams create the conceptual data model based on their business understanding and requirements. The conceptual data models are typically semantic models that are easy to understand by business users.

Data teams transform the conceptual data model into a physical data model. This process typically including integrate data from business system databases, data cleansing, data transformation, data aggregation, etc.

Then business team uses business intelligence software to access the physical data model to get their expected result.

We can see that the traditional data modeling process is very suitable for the age of traditional software.

With the emergence of the modern data stack, the way companies collect and use data has changed. Traditional data modeling is no longer workable in the new age because of the following changes:

More and more different data sources. For traditional data modeling, typical data sources are relational databases. But today, we have databases, APIs, event logs, and so on. Those new data sources have different data formats. And SQL is not enough to process such data formats.

Data is much messier than before. Because traditional data modeling typically processes data coming from business system databases. Data quality is very high. ETL tools can handle such kinds of data easily. But nowadays, we have data from advertisement systems, event tracking systems, etc. Sometimes the data quality is very poor. To handle such kind of data, we need a lot of data cleansing work during data modeling.

Data using scenario is changing. The data using scenarios for traditional data modeling is to generate reports related to the KPI of a business. But more and more companies are using data to drive their daily operation. So more and more ad-hoc data modeling and analysis from original data is becoming popular. So traditional data modeling is so slow for such kind of scenario. The data modeling process needs to be more agile.

Data stacks are cloud warehouse centric. More and more companies moving their data stacks to the cloud and building their data platform around cloud data warehouses. The flexibility and scalability of the cloud make the possibility of data democratization. But the target end-users of traditional data modeling are often C-Level.

Data modeling in the modern data stack

Modern data stack is becoming popular in recent years. And we have some modern data modeling products in the modern data stack.

Modern Data Stack Ecosystem

Dbt, Datameer, and QuickTable are products focusing on data modeling and transformation in the modern data stack.

dbt

Dbt is currently the major data modeling product in modern data stack. Analytics engineers can use dbt to manage their daily SQL works. Dbt has the following major features:

SQL code version control. Dbt integrated with git capability. So analytics engineers can use dbt to do version control of their SQL code.

Package management. Dbt allows analytics engineer to manage their repositories and share them with others. So SQL code now can be modelized.

Code compilation. With dbt, SQL code can have jinja code and the code can be compiled to run on different cloud warehouses.

Documentation. Dbt supports documentation. So analytics engineers can write and share documents of their models with others.

Because the target user of dbt is analytics engineers, it is still an engineering tool. Users need to know SQL if they want to work with dbt.

Datameer

Datameer is another data modeling product in the modern data stack. Dbt can support Snowflake, Databricks, redshift, and other major data warehouses. But Datameer is a snowflake native data modeling and transformation product. Datameer has the following features:

Hybrid team-oriented. Unlike dbt, Datameer users do not need to know SQL. Datameer can support both SQL and no-code data modeling. So no matter whether you are a data analyst, a business operation expert, or an analytics engineer, you can use Datameer easily.

Spreadsheet-like UI. Datameer has a spreadsheet-like UI. So users can easily use the product with drag and drop operation.

Catalog-like documentation. Datameer’s documentation function is much more user-friendly and complete. Documents can be shared between users and teams.

Collaboration. Because Datameer can support hybrid teams, its collaboration capability is more powerful than dbt. Users in the same workspace can work together around data models.

Data profiling. Datameer has a very good data profiling ability. Users can identify dataset problems easily through data profiling.

QuickTable

QuickTable is a no-code data modeling tool in the modern data stack. The target users of QuickTable are those non-technical data users who want to work with data easily. QuickTable has the following major features:

Spreadsheet-like UI. Similar to Datameer, QuickTable also uses a spreadsheet-like user interface. But QuickTable has more data transform functions. Users do not need to use formulas. And user interactions are logged as steps automatically.

Multiple data sources support. QuickTable support affluent data sources such as object store, databases, and data warehouses.

In-memory data modeling engineer. QuickTable has an in-memory data modeling engineer that supports Gigabyte size data processing. So when users are working on data modeling, there is no obvious delay. All the data processing work is done in the memory of QuickTable.

Automatic SQL generation. Non-technical users work on data using a spreadsheet, the generated action steps can be compiled to SQL scripts automatically by QuickTable.

Compatible with major data warehouses. The generated SQL scripts can work with major data warehouses including Snowflake, Databricks.

Collaboration. QuickTable users can work in the same workspace together. They can share their dataset, recipes, and documents.

Summery

As we are moving to the modern data stack. Data modeling is also changing. The traditional data modeling is death. But modern data modeling is emerging. And new data modeling tools also keep improving to support the requirements of modern data modeling.

--

--