Optimize Your Hierarchical Data with Snowflake, Part Two

Saša Mitrović
4 min readFeb 8, 2022

Challenges in Analyzing Hierarchical Data

The Snowflake Data Cloud offers virtually limitless storage and compute capacity. Combined with native support for extremely performant storage and processing of hierarchical data structures, these features enable rapid and reliable root cause and impact analysis across complex data models and processes.

This is Part Two of a three-part blog post that includes the following:

Outdated Tools and Approaches Are Not Designed to Handle Hierarchical Data

Marshmellow Spaghetti Tower
A perfect depiction of a hierarchy: a spaghetti marsh-mellow tower¹

As described in Part One, organizations all deal with data in nested, tree-like hierarchies to understand their business.

But these hierarchies are challenging to analyze. Let’s take cost centers as an example. The legacy financial systems where this data is modelled and maintained are not built for time- and cost-efficient calculations. It can take weeks of analysis just to understand one simple change on a P/L item.

In contrast, Snowflake provides native support for hierarchical data storage and processing. This, combined with elastic compute, dramatically shortens the time required to conduct these kinds of root cause analyses and enables companies to respond to rapidly changing conditions with data-driven decisions.

Challenges with typical relational approaches

A typical relational approach is to create as many columns in a table as there are levels in the hierarchy. This will work when there are only a few levels, but you have to redesign the data model when you add or remove levels. If the hierarchy’s tree structure is not well balanced, storage and processing costs increase.

The relational approach has at least three other problems:

  • Inefficient storage utilization and complexity in maintaining the data (a lot of data redundancy, for instance).
  • Very slow processing times (a lot of joins to traverse the hierarchy). This is more of a problem since compute capacity is significantly more expensive than storage.
  • Complexity for end users. They need to be intimately familiar with the source and raw data to write queries and produce their analytical dashboards. They get bogged down maintaining this complexity instead of unleashing new value based on the data.

Snowflake, with its set of core features such as support for both structured and semi-structured data and recursive queries, has a different approach that allows extremely efficient storing and querying of hierarchical data, eliminating all the downsides of a typical relational approach.

Legacy methods of root cause analysis

Companies have tried to perform root cause analysis with tools that are inefficient and don’t provide timely or reliable results. For example, they might export normalized data from the source systems (which are often siloed by departments such as financial data, parts data, or sales orders) into Excel tables. These tables then get denormalized, that is, data gets joined (parent cost centers with their children) and the tables get bloated with more and more redundant data.

As soon as these tables reach multi-gigabyte sizes (and that’s really just scratching the surface of the data quantities), the users have to start splitting the data into more tables. Financial controllers focus on one problem at a time and end up building their own Excel data marts (which include macros!) around one specific issue, preventing reuse by others. These one-off solutions generate more data, processes, and departmental silos.

The result is multiple copies of the same data, lack of data governance, and inefficient analysis. It takes weeks just to uncover the root cause of one single problem.

Meanwhile, the business doesn’t stop. New data continuously comes in, and the company is always behind in its analysis and cannot act in time to correct issues.

Because of this, many companies react slowly to changing market conditions. Sheer volumes of data and tree-like hierarchical data relationships prevent legacy data warehousing systems from conducting timely root cause analysis and building models to predict how potential changes in different factors will influence their business in the future.

Snowflake’s Data Cloud was designed to provide answers where the old technologies fail.

The Power of Snowflake

Snowflake is a platform built for efficient analysis of massive amounts of data. With Snowflake it’s possible to:

  • Join batch genealogy, BOM, or cost center hierarchies with any other relevant information such as quality, cost, inventory, or sales data
  • Process hundreds of millions of hierarchy paths in seconds with native support for recursive queries
  • Ask complex hierarchy questions across millions of relations and receive answers within seconds
  • In near real time, analyze risk, exposure, and impact on financials, product quality, sourcing, or delays
  • Calculate and apply models for future predictions

Part Three of this blog post describes the power of Snowflake and how you can use it to easily analyze the data in hierarchies to empower your decision-making with data.

[1]: fabrication2014. Spaghetti Towers

Views and opinions expressed in this article are my own and do not represent that of my place of work. I expressly disclaim any liability or loss incurred by any person who acts on the information, ideas or strategies discussed in my stories on Medium.com. While I make every effort to ensure that the information I’m sharing is accurate, I welcome any comments, suggestions, or correction of errors.

--

--