3 new methods for query profiling

3 brand new ways to derive query insights in Snowflake

Assessing the workload within Snowflake is a very important especially considering the fact that the warehouses used for running any query has to be optimally used. We primarily rely on the table “snowflake.account_usage.query_history” to get all such metrics.

More often we thing that the long running queries are the only one we should target whenever we consider a case of optimizing the warehouse however imagine a use case where the “same query runs very frequently”, then in this case also there would be more costs. Hence in this scenario how do we even identify those queries which are running very frequently within a platform with “query_id” we know that each time the query executes within the platform a new query_id would be generated. So what can be the potential solution ?

To address such problems Snowflake has newly introduced 3 new attributes which helps in we getting more insights. They are namely:

  1. QUERY_HASH
  2. QUERY_PARAMETERIZED_HASH
  3. GET_QUERY_OPERATOR_STATS

QUERY_HASH

This is the value which we would now get readily available in the “query_history” table. The way it is calculated is like if there are repeated queries which has the exact same query text values then for all of them the query_hash value would be same.

Foe example let us consider the below mentioned queries:

select * from DEMO_DB.DEMO_SCHEMA_01.DEMO_TEST_01 limit 10;
select * from DEMO_DB.DEMO_SCHEMA_01.DEMO_TEST_01 limit 10;

Now it is seen that for both the set of queries the query text is exactly the same hence in this scenario although we would see the query ids are different but the query hash values would be the same.

select query_id, query_text, query_hash, Query_parameterized_hash from snowflake.account_usage.query_history 
where date(start_time)='2023-10-13' and query_text like '%DEMO_TEST_01%';
Same query hash value

This parameter can also be used to monitor the performance for a specific query over time. For example, we can compare the average execution time for a given query_hash before and after performance optimizations such as search optimization service, query acceleration service, etc..

QUERY_PARAMETRIZED_HASH

The query_parameterized_hash is a hash value which is computed based on the parameterized query. This means the the queries should be having a value after the literals.

Let us understand it by seeing the below example.

select * from DEMO_DB.DEMO_SCHEMA_01.DEMO_TEST_02 where CP_CATALOG_PAGE_SK=1;
select * from DEMO_DB.DEMO_SCHEMA_01.DEMO_TEST_02 where CP_CATALOG_PAGE_SK=2;
select * from DEMO_DB.DEMO_SCHEMA_01.DEMO_TEST_02 where CP_CATALOG_PAGE_SK=3;

If we see over here all the 3 queries are different since in the where clause the CP_CATALOG_PAGE_SK value is varying from 1 to 3. Now, in this case the query_id & the query_hash both would be having the different values but the query_parameterized_hash value is going to be the same.

Same query parametrized hash value

There are only certain comparison operators which this feature supports mentioned as below:

Operators Supported

GET_QUERY_OPERATOR_STATS

This is a system function which returns statistics about the query operators after it has been successfully completed its execution. This is a table function. The information is returned as a table. Each row in the table corresponds to one operator. The row contains the execution breakdown and the query statistics for that operator.

Because this function is a table function, you must use it in a FROM clause and you must wrap it in TABLE(). Below is how we can use it.

The o/p

Summary

To summarize we need to check these parameters regularly to ensure we also do the right sizing of the warehouses, identify the frequent queries which might need potential tuning. Snowflake continuously works for making these new values available to ensure the platform is used in an optimal way.

Please keep reading my blogs it is only going to encourage me in posting more such content. You can find me on LinkedIn by clicking here and on Medium here. Happy Learning :)

Awarded as “Data Superhero by Snowflake for year 2023”, click here for more details.

Disclaimer: The views expressed here are mine alone and do not necessarily reflect the view of my current, former, or future employers.

--

--

Somen Swain
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Snowflake Data Superhero 2024 & 2023 | AWS Solution Architect Associate | 2XSnowflake Advanced Certified | Principal-Data Engineering at LTIMindtree