Converting XML to JSON in Snowflake using External Functions

During a recent project with a customer involving migration from Oracle, I was asked if there was an easy way with Snowflake to convert XML data into JSON.

The answer? Out of the box — no, but we can lean on Snowflake External Functions and very easily (and quickly) implement a function in Python to do this using the xmltodict module. An alternative approach might be to write a stored procedure in Snowflake, however, this can get complicated if you have complex XML structures and can hit recursion limits.

I’m a huge fan of External Functions; the ability to be able to execute Python code (or any language in Lambdas, Azures Functions, or GCP) adds a whole new level of extensibility to Snowflakes ELT capabilities, such as the simple example you see here through to Machine Learning inference coupled with tasks and streams.

Snowflake has a lot of great documentation around the configuring of External Functions for AWS, Azure, and now GCP, so I won’t get into that detail too much however this article should provide enough context and instruction on how to get this working.

Note — this demo was built on AWS, however, there should be no reason why it cannot easily be deployed to the other cloud providers.

Python Code

In the working demo I have, my python code resides in an AWS Lambda Function, and it’s so simple; beyond the templated structure of the code (i.e. extracting the event payload, and elements within that) we do the conversion in 1 simple line of code highlighted in bold below — the rest is pretty much part of any code you would see in AWS Lambda Function used with a Snowflake External Function.

Snowflake DDL

Once the Lambda Function is implemented, and the other necessary components are configured (API Gateway, IAM etc) we can create our API Integration in Snowflake — again, this is well documented however I’ll add an example below of doing this for AWS:

USE ROLE ACCOUNTADMIN;

CREATE OR REPLACE API INTEGRATION INT_XMLTOJSON_API
API_PROVIDER = AWS_API_GATEWAY
API_AWS_ROLE_ARN = 'arn:aws:iam::000000000000:role/snowflake-api-role'
ENABLED = TRUE
API_ALLOWED_PREFIXES = ('https://zzzzzzzz.execute-api.us-west-2.amazonaws.com');

Once this is created, you can create your external function:

CREATE OR REPLACE EXTERNAL FUNCTION CONVERT_XML_TO_JSON(XML_STR STRING)
RETURNS VARIANT
API_INTEGRATION = INT_XMLTOJSON_API
AS 'https://zzzzzzzz.execute-api.us-west-2.amazonaws.com/xmltojson/';

Side note — I would not use ACCOUNTADMIN to create the External Function, instead grant usage of the API Integration to a lower role in the RBAC structure and use this to create the External Function.

Displaying the results of our new External Function

Example

Using our newly created External Function, we can now make calls like so:

SELECT CONVERT_XML_TO_JSON(
'<items>
<item id="0001" type="donut">
<name>Cake</name>
<ppu>0.55</ppu>
<batters>
<batter id="1001">Regular</batter>
<batter id="1002">Chocolate</batter>
<batter id="1003">Blueberry</batter>
</batters>
<topping id="5001">None</topping>
<topping id="5002">Glazed</topping>
<topping id="5005">Sugar</topping>
<topping id="5006">Sprinkles</topping>
<topping id="5003">Chocolate</topping>
<topping id="5004">Maple</topping>
</item>
</items>');

and get this in return:

{
"items": {
"item": {
"@id": "0001",
"@type": "donut",
"batters": {
"batter": [
{
"#text": "Regular",
"@id": "1001"
},
{
"#text": "Chocolate",
"@id": "1002"
},
{
"#text": "Blueberry",
"@id": "1003"
}
]
},
"name": "Cake",
"ppu": "0.55",
"topping": [
{
"#text": "None",
"@id": "5001"
},
{
"#text": "Glazed",
"@id": "5002"
},
{
"#text": "Sugar",
"@id": "5005"
},
{
"#text": "Sprinkles",
"@id": "5006"
},
{
"#text": "Chocolate",
"@id": "5003"
},
{
"#text": "Maple",
"@id": "5004"
}
]
}
}
}

--

--

Simon Coombes
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Director of Data Science Services @ Snowflake ❄️. ML Enthusiast, IoT Builder, and Maker/Inventor when I have time.