New Ways to Analyze Query Performance in Snowflake — and Optimize Costs

Because Snowflake billing is based on consumption, improving query performance can help you optimize Snowflake costs. Identifying inefficient queries is the first step towards optimizing them. We are excited to announce the general availability of three new ways you can investigate query performance: query_hash, query_parameterized_hash, and get_query_operator_stats().

Query_hash

The query_hash column is a hash value that is computed based on the canonicalized text of the SQL statement. Repeated queries that have exactly the same query text have the same query_hash values. If there are binds in the query, these will also have the same query_hash.

Before the query_hash, most customers focused on optimizing their longest running queries first. The thinking was that the longest running queries were also the most expensive. While this is a reasonable approach, it does not take into account the fact that repeated queries, even if they have a shorter execution time, can accrue significant cost. Repeated queries have the same query_hash, so you can use the query_hash to better investigate the queries that contribute most to cost.

You can also use the query_hash to track performance for a specific query over time. For example, you can compare the average execution time for a given query_hash before and after performance optimizations such as Automatic Clustering, Search Optimization, or Query Acceleration Service.

Query_parameterized_hash

The query_parameterized_hash is a hash value that is computed based on the parameterized query, which means the version of the query after literals are parameterized. These literals must be used in the query predicate and must be used with one of the following comparison operators:

= (equal to)

!= (not equal to)

>= (greater than or equal to)

<= (less than or equal to)

You can use the query_parameterized_hash for the same use cases as the query_hash. The query_parameterized_hash has the advantage of grouping similar queries together, even if their parameter values are different.

Get_query_operator_stats, also referred to as programmatic access to Query Profile

Get_query_operator_stats() is a system function which returns statistics about individual query operators within a query that have completed. The output of this function is identical to the Query Profile in Snowsight. You can use the query_hash and query_parameterized_hash to identify queries to optimize and then further analyze the query execution using get_query_operator_stats(). We announced the public preview for this function back in December 2022 and are excited that it is now generally available!

Real-world example

Let’s use this new query data to analyze performance in Snowflake’s own internal Snowflake account. We’ll analyze the most expensive queries that use the “Snowhouse” warehouse, which is one of Snowflake’s biggest warehouses on our internal account. We’ll limit our results to the five most expensive queries that were executed in the past week.

SELECT 
Date_trunc('day', start_time),
query_parameterized_hash,
Count(*) AS NumExecutions,
Sum(total_elapsed_time)/60000 AS TotalExecTime,
Avg(total_elapsed_time)/60000 AS AvgExecTime
FROM
snowflake.account_usage.query_history
WHERE
warehouse_name = 'SNOWHOUSE'
AND Date_trunc('day', start_time) >= CURRENT_DATE() - 7
GROUP BY
Date_trunc('day', start_time),
query_parameterized_hash
ORDER BY
4 DESC
LIMIT 5

Results:

These five query_parameterized_hashes, on their own, were not the longest running queries run on the Snowhouse warehouse. However, because these queries all repeated dozens of times or more, they collectively accounted for more warehouse runtime than any other query_parameterized_hashes! You’d never expect that a query that runs in 11.3 minutes would account for the biggest portion of warehouse resources, but in this case, it did!

After identifying the queries to optimize, you can use the Query Profile to analyze query performance, which is now available in both the Snowsight UI and the get_query_operator_stats system function.

Get started

Query_hash, query_parameterized_hash, and get_query_operator_stats are all GA. To get started, review the documentation:

--

--