Working with XML in Snowflake: Part III

How to navigate repeating groups in XML structures with the FLATTEN function and LATERAL join method.

Paul Horan
Apr 12 · 5 min read

Introduction

This is the third in a series of posts that cover tips and techniques for working with XML-based data in Snowflake. Part 1 of the series was a general introduction to Snowflake’s support for semi-structured data formats, and how to use the COPY statement to load XML files into a Snowflake VARIANT column. Part 2 of the series introduced the SQL syntax for retrieving data out of a variant column containing XML, using the GET() and XMLGET() functions. I showed how to walk the XML node hierarchy by “nesting” the XMLGET() calls.

Setup

We’ll use the same XML doc that was presented in part 2 — the Insurance Claim document. If you’ve read part 2, then you should already have the Claim table defined and loaded with that XML. If not, here’s that code snippet:

The FLATTEN function

FLATTEN() is quite a versatile function, and can be used in pretty much any situation where you have a list of compound values and need an in-line relational representation of that data. The Snowflake docs show how it can be used to transform semi-structured JSON data into a table structure, but that isn’t really helpful when you’re working with XML...

  • KEY: the key to the exploded value
  • PATH: the path to the element within a data structure
  • INDEX: the index of the element, if it is an array; otherwise NULL
  • VALUE: the value of the element of the flattened array/object
  • THIS: the element being flattened

The LATERAL keyword

Now that we have a repeating array flattened into a relational table structure, you’d think we could just reference it in the FROM clause like any other table. That’s not exactly true here… We need to perform a lateral join to that expression. Adding the LATERAL keyword in the from clause tells Snowflake to create an inline view, generated from each row in the “outer” table of the query. So for each row in CLAIM, perform the XMLGET() calls and get a reference to the <Options> array. There’s only one row in our CLAIM table, but you could envision a table full of individual claim records, each containing a full XML document for that claim.

  1. The expressions in the SELECT list reference that alias and its value column, which was created by the FLATTEN() call. That’s a variant, and we can pass it to XMLGET() to retrieve the data values from its subnodes.

When the XML is not so nice

We got lucky. The author of this XML used a nice “wrapper” node named <Options> to encapsulate the array of <VehicleOptions>. Not every XML document is going to cooperate like this. Remember that you “flatten” the parent node to reference the array of its children. When that wrapper node doesn’t exist, the repeating array nodes are commingled with other non-array nodes. I simplified the original CLAIM table syntax to illustrate that concept:

  1. The WHERE clause now contains a filter that retrieves the name of the child nodes (using the '@' argument to the GET() call), casts it as a string, and filters out the node names we don’t want. Remove that WHERE clause and see what happens…

Other considerations

XML documents can (and often do) have more than one repeating array in them. Don’t try to LATERAL FLATTEN() more than one nested array in a single query, unless they’re in the same XML path. These are “cross-joins”, otherwise known as Cartesian Products, which are probably not what you’re looking to achieve. In the next installment, we’ll go even deeper and show how to work with XML that contains multiple nested arrays, without the dreaded Cartesian Product.

Snowflake

Articles for engineers, by engineers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store