Optimize Your Hierarchical Data with Snowflake, Part One

6 min readFeb 1, 2022

Understanding Hierarchical Data

Snowflake’s powerful architecture enables you to maximize the value of data in your hierarchies

The Snowflake Data Cloud offers virtually limitless storage and compute capacity. With Snowflake, users can analyze virtually any amount of data and implement multiple workloads with near-instantaneous elasticity, allocating the resources they need at the moment they need them.

Combined with native support for extremely performant storage and processing of hierarchical data structures, these features enable root cause and impact analysis across complex data models and processes such as batch genealogy, bill of materials (BOM), and financial controlling to take seconds or minutes, as opposed to weeks or months.

The impact this efficiency has on a business is immense. Companies can now react to changing market conditions and virtually any other factors influencing their business in near real time and implement innovative business models and processes that were impossible earlier.

This is Part One of a three-part blog post that includes the following:

Part One: Understanding Hierarchical Data
A look at forms of hierarchical data and a real-life example of how the inability to process hierarchical data resulted in loss of life and a company’s bankruptcy
Part Two: Challenges in Analyzing Hierarchical Data
A description of why analyzing hierarchical data is so difficult and unreliable for legacy data warehouse systems
Part Three: Using Snowflake to Maximize the Value of Data in Hierarchies
Example showing how Snowflake enables you to cost-effectively and efficiently analyze data in a complex hierarchy
Part Four: The Grand Finale — Visualizing the data
What does it take to analyze this data using Tableau?

Do You Remember Takata?

In what the U.S. National Highway Traffic Safety Administration (NHTSA), the U.S regulatory agency for traffic safety, has called “the largest and most complex safety recall in U.S. history¹”, millions of passenger cars by nearly 20 different automakers were recalled due to faulty airbags. Lives were lost due to shards from exploding airbag parts lodging in drivers and passengers. NHTSA estimates at least 27 deaths worldwide, along with at least 400 injuries².

The supplier of these faulty airbags was a company called Takata. A series of poor judgements and mishaps³ ⁴, resulted in Takata, a company that held 20% of the global airbags market, to file for bankruptcy in 2017. Takata crumbled under the financial burden of regulatory fines and recall costs.

This series of events highlighted an underlying technological issue with data, more specifically, data governance and the ability to have and process all relevant (and very complex) data in one place.

Hierarchies Are Everywhere

Takata had abundant data and early knowledge about faulty airbags. But because Takata was unable to provide an accurate impact analysis of their data, many cars that didn’t actually have faulty airbags also had to be recalled, leading to escalating recall costs.

The company could not meet the challenge of bringing all that complex hierarchical data together to quickly identify the root cause and analyze the impact. Many companies face the same challenges with their legacy data warehouse systems.

Let’s look at three examples of data hierarchies to understand the challenge better.

Example 1: Batch genealogy

The root cause in Takata’s case is a faulty inflator that was produced in batches. The batches were delivered to build airbag assemblies, which were also produced in batches, and these assemblies were delivered (you guessed it, in batches) to different automakers. A properly conducted impact analysis could have shown exactly which batches were affected by the faulty part and where they were delivered.

Batch traceability is a well-known challenge in supply chain management. Figure 1 shows a graphical illustration of the problem. Even a “simple” product like a soft drink has an incredibly complex supply chain — the components like flavouring syrup, bottles, bottle caps, water… are all ordered and produced in batches.

So a final product is composed of many different batches of single components and, you guessed it — that final product is also produced in batches itself.

Figure 1: The batch traceability challenge

For any company, knowing exactly when and what they’ve delivered, and to whom, is crucial for protecting the company’s brand reputation and mitigating financial exposure. If there is a recall, that data enables them to limit the product recall size and the regulatory risk (by being actually able to prove the recall size) and potentially reduce regulatory fines.

To do a root cause analysis and impact analysis, companies must traverse the batch genealogy up and down and do a lot of calculations on related data during the process.

Example 2: Multi-level bill of materials

Before manufacturers have products to ship, they first have to produce them. And products are complex. So companies use a bill of materials (BOM) to track parts, components, and assemblies that make up a product. Modern discrete manufacturing is based on the reuse of assemblies (for example, an engine reused across different car models). To depict this complexity, these BOMs are often represented as multi-level, nested structures.

Calculating the manufacturing cost of a product requires companies to traverse the BOM hierarchy up and down and do a lot of calculations during the process.

Figure 2 shows a much simplified example of a laptop’s BOM hierarchy. Each BOM item in the illustration is a complex assembly that is reused in other complex assemblies.

For instance, the battery pack consists of the battery cells, the casing, the electrical contacts… and these are going to be re-used across multiple battery pack assemblies.

Ideally, manufacturers want to know the price of the product in the design phase, before it’s actually produced, so they can plan and optimize the various resources used in manufacturing and make sure the market price of the product is competitive.

Example 3: Cost centers and profit and loss statements

Finally, companies have to keep track of their finances (revenue, costs, and expenses) and report to the board of directors and shareholders. Global companies have thousands of cost centers, and all these different costs end up on the profit and loss (P/L) sheet. When a company wants to understand why costs have changed, they need to look into all costs that affect that specific P/L item, which entails traversing up and down the hierarchy and doing a lot of calculations along the way.

The pattern is obvious. Manufacturing, logistics, and financial controls all deal with nested, tree-like hierarchies to understand their business. Find out in part two of my blog post, Challenges in Analyzing Hierarchical Data, why legacy warehouse systems are so bad at processing hierarchical data reliably and cost effectively.

[1]: NHTSA. NHTSA at Work: Safety Agency Oversees Free Air Bag Repairs for Millions of Americans[2]: Consumer Reports. Takata Airbag Recall: Everything You Need to Know[3]: The New York Times: A Cheaper Airbag, and Takata’s Road to a Deadly Crisis[4]: Reuters: Exclusive: Takata engineers struggled to maintain air bag quality, documents reveal

Views and opinions expressed in this article are my own and do not represent that of my place of work. I expressly disclaim any liability or loss incurred by any person who acts on the information, ideas or strategies discussed in my stories on Medium.com. While I make every effort to ensure that the information I’m sharing is accurate, I welcome any comments, suggestions, or correction of errors.