Snowflake Higher Order Functions (HoF)
Higher-order functions are functions that can take other functions as arguments or return functions as results. In the context of data analysis, higher-order functions are crucial as they enable a more abstract and expressive way to manipulate and process data. They allow for the creation of reusable and composable code, facilitating tasks such as filtering, mapping, reducing, and transforming datasets with concise and readable syntax. By utilizing higher-order functions, data analysts can write code that is more modular, easier to understand, and maintainable, ultimately leading to more efficient and effective data analysis workflows.
For example, when dealing with JSON data, FILTER can be used to extract or filter out specific JSON objects or arrays based on criteria defined within the JSON structure. Similarly, when data is stored in Snowflake arrays, TRANSFORM allows for selective inclusion or exclusion of array elements based on conditions, providing a streamlined approach to data processing.
Benefits of Higher Order Functions over Lateral Flatten
- Readability: HOFs offer more concise and expressive syntax than LATERAL FLATTEN. They let you keep the focus on the transformation logic, avoiding the nested structure of LATERAL FLATTEN.
- Performance: HOFs can often execute faster than LATERAL FLATTEN, especially with complex transformations. They operate directly on the array data without the need to create a temporary table.
- Maintenance: HOFs lead to more maintainable code. The lambda expressions within HOFs clearly define the operation to be performed, making it easier to understand and modify the logic.
Quick Example:
-- Higher-order function (HoF) example in SQL - Given an Iterable array; FILTER out even numbers and return only odd numbers
SELECT FILTER(ARRAY_CONSTRUCT(36, 21, 87, 14, 59, 42, 73, 5, 68, 92),
x -> MOD(x, 2) = 1) AS odd_values;
-- Output: [21, 87, 59, 73, 5]
A Practical Example: Sales Data Analysis
Let’s explore how to use Snowflake Higher Order Functions (HoF) in a practical scenario.
⛁ Sample Data Model: salesdb-data-model
Setup the SalesDB sample data model and proceed with the rest of the exercises. This model provides a robust foundation for tracking customers, buyers, clients, and sales opportunities.
-- Assign the databaseschema used co-pilot
use schema salesdb.custs;
Create and load records into the customer table.
CREATE OR REPLACE TABLE customer_orders AS
SELECT 1 AS customer_id, '2024-01-01' AS order_date, [{'device':'Cardiac Monitor', 'quantity':67, 'price':1500}, {'device':'Patient Monitor', 'quantity':1, 'price':1200}] AS order_detail UNION
SELECT 2 AS customer_id, '2024-01-02' AS order_date, [{'device':'Insulin Pump', 'quantity':5, 'price':56}, {'device':'Hearing Aid', 'quantity':5, 'price':1000}] UNION
SELECT 3 AS customer_id, '2024-01-03' AS order_date, [{'device':'MRI Scanner', 'quantity':3, 'price':3000}, {'device':'X-ray Machine', 'quantity':2, 'price':2500}] UNION
SELECT 4 AS customer_id, '2024-01-04' AS order_date, [{'device':'Ultrasound Machine', 'quantity':4, 'price':4000}, {'device':'Surgical Robot', 'quantity':1, 'price':5000}] UNION
SELECT 5 AS customer_id, '2024-01-05' AS order_date, [{'device':'Defibrillator', 'quantity':2, 'price':2000}, {'device':'Anesthesia Machine', 'quantity':3, 'price':3500}];
Let’s delve into how Snowflake HoFs can supercharge your data workflow:
Task: You need to calculate the total revenue generated by each customer across all their orders and only show customers with revenue GREATER THAN 15000.
Without Higher Order Functions (HOF)
Verify that the records have been loaded into the table. Notice the last column is a Snowflake variant data type holding the Json payload.
SELECT * FROM customer_orders;
Let’s dial in and look at one specific record, where Customer ID is 1
SELECT * FROM customer_orders where customer_id = 1;
Notice one sample JSON payload has multiple records. This is an iterable we need to loop through to calculate the Total Revenue (i.e. Price * Quantity), and then FILTER where the value doesn’t meet the predicate (i.e. greater than 15000).
First we Flatten the records. To go from one record in a column, into multiple rows.
SELECT
customer_id, item.value
FROM
CUSTOMER_ORDERS,
LATERAL FLATTEN(input => order_detail) item;
Pull out the values of interest from the ARRAY record
SELECT
customer_id, item.value:quantity::NUMBER as Quantity, item.value:price::NUMBER as Price
FROM
CUSTOMER_ORDERS,
LATERAL FLATTEN(input => order_detail) item;
Calculate the total_revenue with basic math.
SELECT
customer_id, SUM(item.value:quantity::NUMBER * item.value:price::NUMBER) AS total_revenue
FROM
CUSTOMER_ORDERS,
LATERAL FLATTEN(input => order_detail) item
group by all;
FILTER records that don’t meet the predicate of GREATER THAN 15000
SELECT
customer_id, SUM(item.value:quantity::NUMBER * item.value:price::NUMBER) AS total_revenue
FROM
CUSTOMER_ORDERS as co,
LATERAL FLATTEN(input => order_detail) item
group by all
having total_revenue >= 15000;
Return the result to show the entire ARRAY of records that weren’t filtered out. Just for readability and formatting purposes.
SELECT
customer_id, item.value
FROM
CUSTOMER_ORDERS as co,
LATERAL FLATTEN(input => order_detail) item
group by all
having SUM(item.value:quantity::NUMBER * item.value:price::NUMBER) >= 15000;
This is one way to achieve the task using the existing FLATTEN function. Next we would explore how to do that using the newer Higher Order Functions (HoF) in Snowflake.
With Higher Order Functions
For this task, we would use HoF to identify high-value customers based on the total purchase amount.
First, Verify the records in the variant column as JSON
SELECT o.order_detailFROM customer_orders o;
Using FILTER HoF to go through the iterable (i.e. order_detail) and apply a function to it that calculates price * quantity and checks the predicate. This only returns records that meet the check criteria (15000) in this case. Records that don’t meet that criteria are filtered or simply ignored in the result set.
SELECT customer_id, FILTER(o.order_detail, i -> (i:quantity::NUMBER * i:price::NUMBER) >= 15000)FROM customer_orders o;
Notice, Null arrays are returned. Filter those out by adding a predicate in the where clause to check for array size.
SELECT
customer_id, FILTER(o.order_detail, i -> (i:quantity::NUMBER * i:price::NUMBER) >= 15000) AS filtered_orders
FROM customer_orders o
WHERE ARRAY_SIZE(filtered_orders) > 0
group by all;
Comparison:
The HOF version is more concise and focuses on the transformation (UPPER) directly. It’s also more flexible, as you can easily modify the lambda expression to perform other operations.
When to Use Each Approach:
- HOFs: Preferred for simple transformations or filtering of array elements.
- Lateral Flatten: Useful when you need to join the flattened data with other tables or perform complex aggregations that require a relational structure.
Additional Considerations:
- Nested Arrays: HOFs are primarily designed for simple arrays. Lateral Flatten is more suitable for handling nested arrays.
- Feature Support: Snowflake continues to enhance its HOF capabilities, so check the latest documentation for the most up-to-date functionality.
Resources
Snowflake Documentation: Querying Semi-structured Data | Snowflake Documentation
Hello and welcome! Thrilled you’re here.
We share insights on data, AI, tech trends, and the future. Thank you for being a part of this community! 🙏
Before you go:
If you found value in this post, please clap and follow to stay updated! 👏
Discover more at DemoHub.dev & Free Tutorials 🚀
Join our free weekly newsletter for the latest insights and updates 📰
Connect with us: YouTube, LinkedIn, SubStack, Calendar, DemoHub Email, or reach me personally!