The Modern Scientist

The Modern Scientist aspires to connect builders & the curious to forward-thinking ideas. Either you are novice or expert, TMS will share contents that fulfils your ambition and interest. Write with us: shorturl.at/hjO39

Data Vault on Snowflake: Data Retention & Storage Tiering — Reporting

--

This article is an addendum to the previous article detailing the use of Data Vault’s eXtended record Tracking Satellite (XTS) for record management. As a refresher, we added the following columns to a data vault satellite table

  • dv_snowflake_query_id — Generated GUID, recording the query that loaded that record to that satellite table. With this column we can retrieve more query details and related artefacts from the Snowflake.Account_Usage schema, for example

o Using query_history we can articulate the query that was executed to load the satellite table, this is the compiled query. In other words, as parameterised data vault automation do — the SQL code that gets executed on Snowflake is the compiled code with all macros and parameters resolved.

o Using access_history we can trace lineage (source & target) for each satellite table and what policies were in effect at the time of execution.

  • dv_xts_event — ‘insert’/’copy’, identifying that if the record inserted into the satellite table is a regular insert or is the result of a timeline-correction (copy) event

We also added the following metadata columns to XTS:

  • dv_sequence_violation — True/False, the record we’re loading is older than a record we have previously seen for a business entity. Relate this to scenarios 4 and 5 in this article.
  • dv_record_retention_state — ‘Active’/’Archived’/’Purged’, stipulating in what state the record for the business entity or transaction is.
  • dv_disposal_record_requested — True/False, identifying if the record has been requested to be forgotten.
  • dv_disposed_record_reemerged — True/False, identifying that a previously obfuscated record has re-emerged.

And now to demonstrate the types of reporting we can derive from XTS…

COPY (out of sequence) events

Highlighting that data is arriving out of sequence should be a focus of concern for the health of your data orchestration.

Tracking Late Arriving Records

Record State / Certificate of Destruction

Generally, as your data ages you will see more data going into archival and purging.

Track when records are ‘Active’, ‘Archived’ and ‘Purged’

Re-emergence Report

Another report that should highlight a concern that data you shouldn’t have exists in a source application!

Tracking when deletion requests are made and when they come back from source, and they shouldn’t!

An image is worth 1000 words

Keeping a full audit history of the state of your data is an important part of data management. This approach with XTS ensures that we capture only that state, and nothing that can be used to reidentify those records. Ultimately, data management like data vault is a discipline and requires the same rigour as the data vault approach itself. Data vault in turn has the repeatable patterns you need to manage data management effectively.

References

The views expressed in this article are that of my own, you should test implementation performance before committing to this implementation. The author provides no guarantees in this regard.

--

--

The Modern Scientist
The Modern Scientist

Published in The Modern Scientist

The Modern Scientist aspires to connect builders & the curious to forward-thinking ideas. Either you are novice or expert, TMS will share contents that fulfils your ambition and interest. Write with us: shorturl.at/hjO39

Patrick Cuba
Patrick Cuba

Written by Patrick Cuba

A Data Vault 2.0 Expert, Snowflake Solution Architect

No responses yet