The Link between Effectivity Satellites and Driver Keys.

Patrick Cuba
The Modern Scientist
9 min readSep 24, 2024

Within the Data Vault community there is a misconception of what an effectivity satellite table is, how to model it, where it should be used and how it compares to the other common patterns like record tracking satellite and status tracking satellite tables.

  • To start with, an effectivity satellite is not an SQL view over your satellite table with a virtual end-date; recall that data vault 2.0 is strictly insert-only. However, a physicalised end-date does exist in an effectivity satellite and we never update the end-date that was inserted.
  • An effectivity satellite table is intrinsically related to the driver key concept; you cannot have an effectivity satellite table without defining what that driver key (or keys) is. The driver key is the concept of defining what is the principal business object we are tracking the relationship changes against. A driver key concept essentially tracks what other business objects that driver business object is related to at that point in time. You can have driver keys without an effectivity satellite because the application source has provided a business date of that relationship change and you simply load that attribute into a link-satellite table.
  • Effectivity is never ever built into a link table (adding start and end dates to a link table); this destroys the intended scalability of the link table because you have essentially made the link table single purpose and no longer the shortest and most efficient path between business objects. A single link table can support as many link-satellite tables and (by extension) effectivity satellite tables as you need, and you might want to model the latter for a variety of reasons: tracking the driving relationship differently to how the source application tracks that relationship. Of course, that would make this implementation of an effectivity satellite table a business vault artefact.

Yes, effectivity satellite tables can only ever exist off a link table (you will never see an effectivity satellite table whose parent artefact is a hub table), it is only there for tracking relationship changes and nothing else. If descriptive attributes about the relationship changes, this will not change the relationship and those descriptive attributes will load to the link-satellite table.

Being able to track what is the current relationship in a link table solves one very important problem area about link tables: how do you know which is the current relationship if the relationship returns to a previous state? Especially if,

  • The source application does not supply this change
  • Nothing else changes about the relationship’s state (the attributes)

Ok with that aside, let’s dig into how effective the effectivity satellite table is!

By the way, the original post for this concept I posted in 2021 titled “Data Vault Mysteries… Effectivity Satellite & Driver Key”. We’re revisiting this concept because through customer and partner interactions I see the need to clarify a few concepts and to perhaps better explain this modelling pattern and where it should be used.

Table Structure

Effectivity satellite tables do not have any descriptive attributes; these should be stored in your link or hub satellite tables as you have designed them through satellite table splitting. Effectivity satellites are only about the relationship and nothing else, hence the table structure below is not expandable (unless you’re looking to add some more metadata columns like task id, service account user-id, Jira id…).

What you’ll notice in the table structure is that there is nothing in there indicating what the driver key is! Yes, we do not need to! Our link table does not have anything denoting what the driver key is either! What this promotes is the fact that our link table is intended to be scalable and designating a driver key is like adding start and end dates to the link table itself, it makes it single purpose and unscalable.

For this article, this is the proposed model and data pipeline. We’ll explain the components shortly.

The Data Model and Data Pipeline

Scenarios

To keep these examples simple, we will use dummy key values without hashing which will make this article easier to follow and we will restrict the relationships we are tracking to just two business objects, account and product. We will depict four scenarios:

  1. One to one relationship tracking (1:1)
  2. One to many cardinality (1:M)
  3. Flip-flopping relationship, a pattern we discussed in this article about building data vault bridge tables.
  4. When a driver key is no longer related to any other business object, an adoption of zero-keys to achieve zero cardinality (M:0).

For our scenarios, the driver key is the account id, the non-driver key in the relationship is the product id. Let’s call this Day 0, the staged content just before the first insert to the target effectivity satellite table.

Before the first load

Before we get to Day 1, let’s describe the orchestration to achieve the above.

Second-level Staging & Satellite Table Loading

Staging is where the effectivity satellite logic happens, to drive what gets loaded to an effectivity satellite table we need to:

  • Nominate the driver key(s), this will be the column account-id
  • Infer relationship changes based on the active relationship in the target effectivity satellite table by that driver key. What’s new from source will always have the high date value in the end-date column, if the driver key exists in the effectivity satellite and the new staged relationship is different then we must infer that the existing active relationship in the effectivity satellite has ended.

Because we infer a closed relationship, staging for the effectivity satellite cannot be mixed with the regular staging we use for loading other data vault artefacts. If Day 2’s load finds relationships it needs to end from the target effectivity satellite, staging will generate records that do not exist in the source application. Once the staging has completed we use the regular satellite loading pattern to load the staged records into the target effectivity satellite table, in other words, there is nothing special happening in the load code for the effectivity satellite and it will do the same thing as a regular satellite loader will do. The satellite loader still checks if what I am about to load is different to the active record in the target effectivity satellite using the link hash key and record digest (HashDiff). If it is different than load the new record, if it is not then the staged record is not loaded.

Setting the scene, day 1 with an empty effectivity satellite

A quick explanation of what happened above,

  1. One-to-one relationship. (1x open record inserted)
  2. One-to-one relationship. (1x open record inserted)
  3. One-to-many relationship. (3x open + records inserted)
  4. One-to-one relationship. (1x open record inserted)

Day 2

Let’s process the first relationship changes and see how staging now infers the closing of records (end-dating). Keep in mind, the same staging code is applied to all four scenarios and the outcome of this staging is simply ingested into the target (effectivity) satellite table.

Insert-only, we track by driver key and load relationship changes

Scenario update

  1. A change in relationship has occurred, we infer a close record and insert both the new relationship and the close record. (1x open + 1x close records inserted)
  2. No change in relationship and therefore no record is inserted
  3. The account id was related to three different products (1:3) however now that relationship has ended and replaced with a one-to-one relationship between that account id and a new product id. (1x open + 3x close records inserted)
  4. A change in relationship occurred, we infer a close record and insert both the new relationship and the close record. (1x open + 1x close records inserted)

Scroll back up, notice that the link-hash key and applied date are still unique.

Day 3

Because the link-hash key changes for each driver key and non-driver key combination, the primary key of the table remains unique. The above scenarios and implementation are not multi-active at all, it only appears that way if you’re still only thinking about the driver key in the relationship. If you were to track a different driver key the link-hash key would still be the same. The magic happens in staging and happens again in querying (getting the data out); you need to know what to stage by and what to query by; the same driver key.

Now process one more day of changes…

It’s only about the relationship
  1. No change in relationship and therefore no record is inserted
  2. The driver key (account id) is now no longer related to any product id, we assign the zero key and make that the active record. (1x open + 1x close records inserted)
  3. No change in relationship and therefore no record is inserted
  4. The account id returns to a previous relationship, this is the scenario that is impossible to track in the link table alone.

Getting the data out

Two principal methods for querying your effectivity data are proposed here,

1. The current active relationship by driver key

Active relationship by driver key

2. Show full relationship movement

Driver key relationship history

Effective immediately

The effectivity satellite is only ever about the relationship and the driver key and nothing else. Should you need to track relationship effectivity then you must look to deploy this artefact as early in the history of the relationship as possible. As we have shown, if nothing else changes in the relationship and you’re not able to derive effectivity from a link-satellite table then the effectivity satellite and driver key provides the only way to ensure your raw vault model meets the auditability requirement by being able to recreate the source application at any point in time. The alternative is to somehow retrieve the relationship history and process that through a one-off migration exercise to re-create the full relationship history!

How does the effectivity satellite compare with tracking relationships while using record tracking and status tracking satellites?

  • Record tracking satellite will track every occurrence of a relationship but not against as a distinction between driver and non-driver keys. To recreate the driver key relationship changes you will have to process every record in your record tracking satellite to condense which is the active record every time you need this data.
  • Status tracking satellite will track when a new relationship is ‘I’nserted, ‘U’pdated and ‘D’eleted based on a snapshot from the source and comparing that with the active record in the target status tracking satellite. It does not track driver-key to non-driver key relationships but (like effectivity satellite) the staging does generate a record in the instance that a relationship been deleted. Effectivity satellite does not track relationship deletion.
Tracking relationships in different ways

I hope that clarifies and simplifies your understanding of the link between the effectivity satellite and driver key. To close, adding virtual end-dates to a satellite table as a view is not an effectivity satellite. The full implementation, explanation and code is available in ‘The Data Vault Guru book’, https://amzn.to/3d7LsJV

The views expressed in this article are that of my own, you should test implementation performance before committing to this implementation. The author provides no guarantees in this regard.

--

--

The Modern Scientist
The Modern Scientist

Published in The Modern Scientist

The Modern Scientist aspires to connect builders & the curious to forward-thinking ideas. Either you are novice or expert, TMS will share contents that fulfils your ambition and interest. Write with us: shorturl.at/hjO39

Patrick Cuba
Patrick Cuba

Written by Patrick Cuba

A Data Vault 2.0 Expert, Snowflake Solution Architect

No responses yet