Data Modeling with 3NF vs. Data Vault 2.0 — A Simple Comparison

Chinmayee Lakkad
Hashmap, an NTT DATA Company
5 min readNov 8, 2021

Data Vault, from its first edition ‘1.0’ as a data modeling specification has evolved into a more elaborate version ‘2.0’. Data Vault 2.0 goes above and beyond just a modeling specification to also cover enterprise data architecture and a standard methodology (people, processes, and platforms) to create and sustain the architecture.

The official governing body, the Data Vault Alliance provides a comprehensive view of what it is:

Data Vault 2.0 is a complete system of Business Intelligence that stands on foundational pillars of modeling specification, architecture pattern, and a methodology for agile delivery. The methodology also speaks of the implementation rules, best practices, and standards.

The overall objective is to enable the creation and expansion of an audit-able, scalable, traceable, consistent Enterprise Data Warehouse that is agile enough to support frequent business changes without causing major cascading shifts.

This article primarily focuses on Data Vault 2.0 as a data modeling technique and provides a simple comparison to traditional relational (3NF) modeling.

Comparison

To begin with, all modeling techniques represent the real world in terms of entities and their relationships.

The *difference* is in the way the data about these entities and their relationships are organized, split, and stored to achieve practical goals of consistency, accuracy, speed, scalability, etc. They are able to make organizational decisions based on their prioritization of such qualities.

Let’s take a look at DV2 and 3NF modeling side-by-side to understand this.

3NF is the foundational specification that essentially translates the ER diagram representation (entities, attributes, relationships) of business into relational objects (tables or views).

Entities make the bulk of relational objects of the entire model, each with clear identifiers (business keys) for each instance of that entity. The inter-relationships between these entities are embedded into them as foreign keys (identifiers of the participating entities) if they are of one-to-one or one-to-many type. The many-to-many relationships get their separate tables with participating entities each contributing their foreign keys. In this way, the relationship information is co-located with business entity information making the model a very tight-knit family.

On the other hand, the DV2 modeling technique takes a different style of application of the concept of entities and relationships by always representing the entities and their relationships as separate relational objects right off the bat. *Each* kid has his or her own room!

In its terminology, the entities become HUBs, with clear identifying business keys of each instance, that are stored in their individual tables/views.

Their relationships become LINKs that are considered as many-to-many by default, and so stored as another set of tables/views. Their data is not embedded within the HUBs.

SATELLITEs

The attributes, of entities or relationships, are captured within SATELLITEs which are responsible for holding this descriptive data only. A HUB or LINK could have one or many SATELLITEs depending on how this descriptive data is needed to be maintained during its lifetime.

Because of this clear organization, HUBs and LINKs are only responsible to store identity and traceability information about themselves, leaving the details to be managed by their SATELLITEs.

HUBs and LINKs still are connected by way of hashed keys but have a clear division of duties (as explained above) in what each will capture and hence can evolve independently.

In contrast, in the 3NF method, the information is spread across relational objects which are tightly coupled with each other and have a very conformed structure (constrains applied). Overall, it helps meet the main objectives of keeping data consistent, accurate, and current. And due to this reason, 3NF is very suitable to capture transactional information where only the very latest copy of data is needed to be maintained at any given time.

On the downside, however, 3NF also makes the entire system brittle in the face of frequent shifts in the way business is done.

DV2, on the other hand, slightly loosens the coupling between the relational objects and thereby absorbs changes more easily by localizing them to specific areas (HUB, LINK, or SATELLITE) that need updates. Moreover, expanding a distributed model such as this so as to include newer entities or change existing relationships between entities become tasks of addition (than disruption) where newer HUBs/LINKs/SATELLITEs are constructed to model this change and then get hooked to the existing network.

The design goals for DV2 prioritize agility, scalability, traceability, and consistency which are met quite well by organizing the data as such.
Besides, DV2 was purpose-built for modeling warehouses that store large amounts of historical information (and not just current) that empowers mining and analysis downstream.

So, what is the takeaway then?

Looking at the comparison it might seem that 3NF is a very old technique and has little relevance today, but that would be a fallacy because it still serves as a fundamental way of capturing data in a manner that allows for clear identity and consistency at the most granular level. The DV2 objects themselves are normalized at an individual level to meet these needs.

Additional Resources

Let’s Do Data and Cloud Together!

At Hashmap, an NTT DATA Company, we work with our clients to build better, together. We are partnering with companies across a diverse range of industries to solve the toughest data challenges — we can help you shorten time to value!

We offer a range of enablement workshops and assessment services, data modernization and migration services, and consulting service packages for designing and building new data products as part of our service offerings. We would be glad to work through your specific requirements. Connect with us here.

Chinmayee Lakkad is a Regional Technical Expert and Cloud/Data Engineer at Hashmap, an NTT DATA Company, and provides Data & Cloud solutions and expertise across industries with a group of innovative technologists and domain experts accelerating high-value business outcomes for our customers.

--

--