Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Best practices, tips & tricks from Snowflake experts and community

Image generated using flux-softserve-anime on Replicate by Author.

How to Analyze Query Performance in Snowflake

Have you ever wondered how you can go about optimizing your Snowflake queries for better performance and cost efficiency? It is often a challenge for organizations to identify bottlenecks in their SQL queries, leading to increased costs and slower data processing.

In this blog, we’ll explore a collection of analytical SQL queries that provide deep insights into Snowflake query performance.

Particularly, you’ll learn:
• How to identify and analyze long-running queries
• Methods to track query execution patterns
• Techniques to monitor warehouse utilization

Getting Started

Before diving into the specific queries, ensure you have access to a Snowflake account and basic SQL knowledge.

To follow along with this blog, you can click on Automated_Query_Performance_Insights_in_Snowflake_Notebooks.ipynb to download the Notebook from GitHub.

Analytical SQL Queries on Query Performance

Now let’s explore how we can analyze the query performance in the 5 forthcoming queries.

Analyzing Long-Running Queries

This SQL query identifies the 50 most time-consuming queries in the past day, measuring their execution time and partition scanning efficiency while excluding failed queries.

It should be noted that you should replace <your_warehouse_name> with the warehouse name on your local account that you want to analyze.

SELECT query_id,
ROW_NUMBER() OVER(ORDER BY partitions_scanned DESC) AS query_id_int,
query_text,
total_elapsed_time/1000 AS query_execution_time_seconds,
partitions_scanned,
partitions_total
FROM snowflake.account_usage.query_history Q
WHERE warehouse_name = '<your_warehouse_name>'
AND TO_DATE(Q.start_time) > DATEADD(day,-1,TO_DATE(CURRENT_TIMESTAMP()))
AND total_elapsed_time > 0
AND error_code IS NULL
AND partitions_scanned IS NOT NULL
ORDER BY total_elapsed_time desc
LIMIT 50;

Here’s a partial output that the query returns:

Analyzing Query Patterns in Relation to Execution Time

This SQL query categorizes queries by their execution duration into time-based buckets (i.e. less than 1 second, 1 second to 1 minute, etc.). This enables the analysis of the performance distribution patterns of a warehouse over the past month.

SELECT
CASE
WHEN Q.total_elapsed_time <= 1000 THEN 'Less than 1 second'
WHEN Q.total_elapsed_time <= 6000 THEN '1 second to 1 minute'
WHEN Q.total_elapsed_time <= 30000 THEN '1 minute to 5 minutes'
ELSE 'more than 5 minutes'
END AS BUCKETS,
COUNT(query_id) AS number_of_queries
FROM snowflake.account_usage.query_history Q
WHERE TO_DATE(Q.START_TIME) > DATEADD(month,-1,TO_DATE(CURRENT_TIMESTAMP()))
AND total_elapsed_time > 0
AND warehouse_name = '<your_warehouse_name>'
GROUP BY 1;

An example output is shown below:

Analyzing Query Patterns to Find Repeated Queries

This SQL query identifies the most resource-intensive and repetitive queries over the past week. This is done by grouping similar queries using query hashes and ranking them by their total execution time.

SELECT
query_hash,
COUNT(*),
SUM(total_elapsed_time),
ANY_VALUE(query_id)
FROM snowflake.account_usage.query_history
WHERE warehouse_name = '<your_warehouse_name>'
AND DATE_TRUNC('day', start_time) >= CURRENT_DATE() - 7
GROUP BY query_hash
ORDER BY SUM(total_elapsed_time) DESC
LIMIT 100;

Running the above query gives:

Analyzing Warehouse Load

This SQL query analyzes the Snowflake warehouse performance by tracking both actively running and queued workloads over the past month. This helps to highlight potential resource bottlenecks where queries had to wait in queue.

SELECT TO_DATE(start_time) AS date,
warehouse_name,
SUM(avg_running) AS sum_running,
SUM(avg_queued_load) AS sum_queued
FROM snowflake.account_usage.warehouse_load_history
WHERE TO_DATE(start_time) >= DATEADD(month,-1,CURRENT_TIMESTAMP())
GROUP BY 1,2
HAVING SUM(avg_queued_load) >0;

This yields the following:

Identifying Long-Running Tasks

This SQL query analyzes the task history in Snowflake by measuring execution duration for successfully completed tasks within the past week, thereby helping to identify potentially problematic long-running processes.

SELECT DATEDIFF(seconds, query_start_time,completed_time) AS duration_seconds,*
FROM snowflake.account_usage.task_history
WHERE state = 'SUCCEEDED'
AND query_start_time >= DATEADD (week, -1, CURRENT_TIMESTAMP())
ORDER BY duration_seconds DESC;

This gives us:

Conclusion

You’ve now learned essential techniques for analyzing query performance in Snowflake. These analytical queries provide valuable insights that you can use to optimize your data operations, reduce costs, and improve efficiency. Consider adapting these queries to match your specific use cases and requirements, and regularly monitor your query performance to maintain optimal database operations.

Resources

Here are some additional resources if you need further information:

Videos:

Articles:

Documentation:

--

--

Chanin Nantasenamat
Chanin Nantasenamat

Written by Chanin Nantasenamat

Data Professor on YouTube | Sr Developer Advocate | ex-Professor of Bioinformatics | Join https://data-professor.medium.com/membership

No responses yet