Rethinking Surrogate Keys: Efficient Alternatives for Modern Data Models

Anup Moncy
Data Engineering
Published in
2 min readOct 9, 2023

Surrogate keys are essentially artificial keys assigned to each record in a database table for unique identification.

Unlike natural keys, which are based on the actual data (like a person’s social security number), surrogate keys have no meaning — they’re just there to ensure each row is distinct, identifiable, and linked with the rest of the data model. Their simplicity often makes them shorter and more efficient to index, speeding up searches and queries.

Data Modeller's love affair with surrogate keys and un-ending Kimbal quotations on why we need them…

…sometimes hold us back from moving on with changing times of evolved modern platforms and database architecture.

Where it starts becoming a problem.

Lack of Universality: In modern platforms where we deal with datasets more than tables, it often leads to an overhead of understanding the tables and relationships for validations and analysis before being able to communicate with external systems.

Columnar Datastore eliminates the need for a “view by” smaller table and then joins to large fact tables on indexes.

This is because the data is already organized as columns.

Overhead in Join Operations: Explore the performance impact of using surrogate keys in join operations,

also considering the need to generate and maintain them.

I had an example working in a system where pretty much every report needed a natural key, yet, we needed to join a dimension linked to a claim surrogate key to fetch them.

I had constantly questioned the need for maintaining and populating a surrogate key to split one source into multiple targets and join them back for every report.

This is especially true when modern platforms with advanced column distribution and partitioning mitigate most of the traditional Database issues.

Alternatives to Conventional Surrogate Keys:

1. Don’t normalise if there is no strong data or business need to

2. If this is needed, check if Natural keys can do the job individually or as a composite key,

3. If the composite keys get very complex, generate a hash key (ensure duplicate handling to avoid key collision)

Surrogate keys provide simplicity and stability in data modeling but are not universally standard in modern data platforms.

Other options, such as natural keys and composite keys, offer alternatives based on specific use cases and preferences.

Choosing the right key type depends on the unique requirements of each scenario, emphasizing the importance of a case-by-case approach in database design.

--

--