Advantage Data Vault 2.0

The major difference between Data Vault 1.0 and Data Vault 2.0 is the introduction of hash as surrogate keys instead of utilizing sequence numbers. This is a paradigm shift in the way a data vault is loaded and implemented and opens the way for removing database-enforced referential integrity. If referential integrity is enforced on a data vault it means that you must specify foreign keys between a satellite and its hub/link and the link and its hubs.

This imposes sequential loading rules onto the data vault:

  • The hub must be updated before related satellites and links
  • The link must be updated before related satellites

On the longest path is a link satellite waiting for a link to complete its updates and the link needs to wait for ALL its related hubs to be updated. The problem escalates en masse if you could consider a link with ten or more hubs!

Data vault 1 implementation

With a data vault based on sequence keys, referential integrity enforced at the database level is a requirement because your load process must look up the keys needed in a child satellite or link table from a hub/link+hubs.

With hash-keys referential integrity is not enforced at the database level but rather coded for in a reconciliation framework that executes post load.

--

--