Data Vault — Wins, Trends, and Aspirations from WWDVC ‘21
What’s responsible for the recent DV boom, and how to ensure a successful implementation.
You had to be there
A good speaker lineup is critical to the success of any conference, but the real value lies in the attendees. This year, the WWDVC featured presentations from Kent Graziano, Snowflake’s Chief Technical Evangelist, Data Vault (DV) creator Dan Linstedt, and a man who needs no introduction, Bill Inmon. However, the real price of admission was in the open forums.
Attendees got to ask questions directly to technical experts, swapped war stories with fellow DV veterans, and saw firsthand how tools like SqlDBM facilitate DV rollouts.
While this article can’t recreate the live experience, it can highlight the key insights from this year's conference: ensuring success in a DV implementation, how modern cloud infrastructure facilitates and necessitates DV projects, and the forces that will propel DV in the coming years.
Cloud Warehousing and ML drive DV2.0 adoption
The DV methodology was developed by Dan Linstedt in the early 2000s, long before advances in bandwidth and data storage made cloud computing possible. However, thanks to cloud computing, DV projects are finding more demand than ever before. Not only has the number of data sources exploded, but so has the demand for warehouse migration — the very things DV was designed to facilitate.
Snowflake, one of the biggest drivers and beneficiaries of the cloud migration boom, has also witnessed a correlated rise in DV implementations. While presenting Snowflake’s Data Cloud Roadmap, Kent Graziano highlighted some key metrics regarding DV.
According to Kent, DV is indeed becoming an industry trend, with more than three hundred documented implementations in Snowflake alone. However, it’s not just DV veterans sticking to what they know best, says Kent. As Snowflake requires no upfront investment or commitment, plenty of projects are being led by first-time DV enthusiasts and skeptics alike.
Another clear trend in the DV community is using SqlDBM to model and diagram the data vault. Karl Young, a Data Architect at Interworks, spoke specifically about what makes SqlDBM the go-to tool when designing and deploying a DV model.
Karl’s endorsement stems from two key factors: usability and extensibility. SqlDBM is incredibly simple to pick up and start using. Then, its flexible workflows and modeling features mean that it can be adapted to any type of modeling, from DV landscapes to role hierarchies, even family trees — SqlDBM can handle anything its users can dream up. It’s a tool that effortlessly complements the project lifecycle, from design to documentation.
“[SqlDBM] has been a support, as opposed to any kind of burden…”
— Karl Young, Data Architect, Interworks
The Big Data boom was characterized by the three V’s: Volume, Velocity, and Variety. Since the peak of the concept circa 2015, not only have the three V’s grown in magnitude, but business demands have thrown more V’s into the mix.
Machine learning (ML) solutions, having reached maturity and being widely accessible, are helping BI teams tackle the V of data Veracity. ML processes enable companies to classify, correct, and clean up data without anticipating inputs or defining static business rules. Improvements in veracity then power the next V: Value.
According to Heli Helskyaho, CEO of Miracle Oy, a Finnish data management company, ML helps companies go a step beyond traditional BI by helping them discover the very questions worth asking. Heli emphasized the Key role of ML as part of the modern BI stack, stressing the importance of MLOps in automating the feedback loops of data-driven insight.
Can I get a V for Visualization? Visualization tools are an extension of human cognition to the point of being a superpower. But, as Zoltan Csonka, Data Architect at Infinite Lambda, pointed out in his presentation, the data stack is just driving data to their final destination: visual presentation to human decision-makers. As the stack grows in computing power, scalability, and automation, reporting tools must be prepared to ride that wave as well.
Infinite Lambda is tech-agnostic and must be to meet its customer’s BI demands. However, a panoply of BI solutions does not guarantee project success. According to Zoltan, success lies in knowing how to integrate these technologies and build on top of them through incremental modeling and template-based code.
Winning with Data Vault
Cloud migration and an evolving BI tech stack drive DV implementations, but what’s driving DV success? Here, again, for many presenters, the answer is Snowflake.
In “Eat, Pray; Data Vault,” Veronika Durgin, Lead Data Engineer at Boulevard, explained how she made the leap from on-premise to Snowflake and “never looked back.” Snowflake compliments DV’s data-type-agnostic stance by making it easy to work with any data format. Streaming data via Snowpipe, native support for semistructured formats like JSON and Avro, Snowflake takes care of loading and transformation while DV handles the integration.
After all, DV is not merely a model nor a framework. Instead, Veronika describes DV as a “team sport.” When done properly, DV helps promote data culture within an organization and helps its various departments tell a consistent data-driven story.
Of course, not all organizations will benefit from a data vault. According to Zoltan, in a situation with no complexity, no integration, and no audit, you have no benefit. But where DV methodology applies, it works well — maybe too well…
“It is possible to commit no mistakes and still lose.”
— Fabio de Salles, Product Manager, SERPRO
Presenter Fabio de Salles related a case study from one of his consulting clients who suffered from a deluge of data-related issues. Fabio recognized that a Data Vault could help alleviate much of these problems and pitched the idea of its implementation.
As it turns out, Fabio was right: the Data Vault really did cure the client’s data woes. But, unfortunately, it also meant that the client was no longer billing Fabio for consulting work. Humblebragging aside, Fabio’s is yet another case of DV delivering on its promise of efficiency and scalability.
While Fabio’s was a textbook DV implementation, other presenters spoke of more exotic solutions. Speaker Francesco Puppini, with the endorsement of Bill Inmon, introduced a radical new concept in DW architecture: the Unified Star Schema (USS.) In a book by the same name, Puppini and Inmon propose the idea of a central “Puppini Bridge” table containing all primary key relationships of a data warehouse.
The Puppini Bridge is a superset of all primary/foreign key relationships for the entire database, stored in one centralized table. Using the bridge, a join can then be made between any number of fact and dimension tables, thus simplifying the traditional star/snowflake design.
The USS is a novel and radical solution to the design challenges that have been known to haunt classic Kimball-style datamarts. However, while it does resolve non-conformed granularity joins, scaling and maintainability concerns limit its feasibility. As a result, the USS is a niche solution that may work great in some cases but not justify the cost in others — of course, Data Vault describes itself in exactly the same terms.