Advantage Data Vault 2.0

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

6 min readMar 22, 2021

The major difference between Data Vault 1.0 and Data Vault 2.0 is the introduction of hash as surrogate keys instead of utilizing sequence numbers. This is a paradigm shift in the way a data vault is loaded and implemented and opens the way for removing database-enforced referential integrity. If referential integrity is enforced on a data vault it means that you must specify foreign keys between a satellite and its hub/link and the link and its hubs.

This imposes sequential loading rules onto the data vault:

The hub must be updated before related satellites and links
The link must be updated before related satellites

On the longest path is a link satellite waiting for a link to complete its updates and the link needs to wait for ALL its related hubs to be updated. The problem escalates en masse if you could consider a link with ten or more hubs!

With a data vault based on sequence keys, referential integrity enforced at the database level is a requirement because your load process must look up the keys needed in a child satellite or link table from a hub/link+hubs.

With hash-keys referential integrity is not enforced at the database level but rather coded for in a reconciliation framework that executes post load.

Advantage Data Vault 2.0

Written by Patrick Cuba