Sitemap
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Best practices, tips & tricks from Snowflake experts and community

Follow publication

The art and science of nested, polymorphic JSON transformation in Snowflake

Jon Farr
9 min readApr 26, 2024

--

The Problem

In working with government data and a very well-defined data dictionary the last thing I expected to encounter was polymorphic data. While flattening the nested JSON structure I discovered that some of the array attributes could contain primitive values such as string or could contain full objects with nested arrays. This was a complex problem that needed a more dynamic and eloquent solution.

Our goal is to unlock the value of this kind of data by transforming it into a clearly defined structure with visibility into all attributes at every level, regardless of the data types used for each attribute over time.

Technical Deep Dive

Pancake — Native App Architecture

Step 1 — Scan a data source containing JSON objects and discover the schema and any polymorphic attributes

Source JSON Data with nested object arrays and polymorphic attributes
Pancake — Customer data sources that have been scanned and analyzed

Step 2 — User configuration of metadata

Customer data source metadata that can be configured to generated Dynamic Table creation SQL statements
Pancake view of attributes discovered during the scanning process

Step 3 — Generate SQL to create nested Snowflake Dynamic Tables and a consolidated view of the JSON schema all polymorphic versions for each attribute

Users can then create select statements by joining these dynamic tables at various levels by using the natural foreign key attributes created during the configuration process, enhancing downstream data transformation, interconnectivity, and analysis.

Dynamic tables created from the SQL code generation provided by Pancake to extract, relate,and flatten complex JSON data
A Snowflake graph view representing the dynamic tables created by Pancake
A Snowflake graph view representing the dynamic tables created by Pancake
A Snowflake graph view representing the dynamic tables created by Pancake
Select statement using the dynamic tables created by Pancake
Output of a consolidated JSON document showing all instances of polymorphic attributes from the source dataset

Step 4 — Post-flattening: Monitor and maintain

Our game changers

Our app generates Dynamic Table create statements with column definitions to support each type of data an individual attribute has ever used.

Select statement from one of the dynamic tables created by Pancake showing the selection of data from a primitive version of the address attribute
Select statement from one of the dynamic tables created by Pancake showing the selection of data from an array version of the address attribute

By monitoring the raw data source through a Snowflake stream, Pancake can report any data changes to the raw data sources, allowing for additional user configuration and SQL code generation. You will be able to set up monitoring and generation for multiple environments such as dev, test, staging, production, etc.

Why Snowflake?

Further, the Snowflake Native App Framework and Snowflake Marketplace simplify our go-to-market approach; we do not need to worry about distribution, security, or licensing.

What’s next?

--

--

No responses yet