Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Best practices, tips & tricks from Snowflake experts and community

Image generated using flux-softserve-anime on Replicate by Author.

How to Build an Interactive Query Performance App in Snowflake Notebooks

Have you found it challenging to identify performance bottlenecks in your queries? Efficient analysis and optimization of query performance can help alleviate unnecessary resource consumption and increased costs.

In this blog, I will guide you on how you can build a query performance analysis tool through the use of Streamlit’s interactive visualization capabilities.

Briefly, here’s what we’ll cover:

  • Create interactive widgets for data filtering
  • Implement statistical visualizations using Altair
  • Generate performance insights from query history
  • Build a user-friendly interface for data exploration

Getting Started

Before we dive into the implementation, ensure you have access to a Snowflake account.

To follow along with this blog, you can click on Build_an_Interactive_Query_Performance_App_with_Streamlit.ipynb to download the Notebook from GitHub.

Build the Interactive Query Performance App

Let’s break down the development process into manageable components that we’ll build out:

  • Write the Base Query
  • Build the Streamlit Interface
  • Retrieve and Process Data
  • Display Data Visualization

Write the Base Query

First, we’ll establish our foundation by writing a SQL query to fetch the query performance data. Note that you should replace CHANIN_XS with your own warehouse name.

SELECT query_id,
ROW_NUMBER() OVER(ORDER BY partitions_scanned DESC) AS query_id_int,
query_text,
total_elapsed_time/1000 AS query_execution_time_seconds,
partitions_scanned,
partitions_total,
FROM snowflake.account_usage.query_history Q
WHERE warehouse_name = 'CHANIN_XS'
AND TO_DATE(Q.start_time) > DATEADD(day,-1,TO_DATE(CURRENT_TIMESTAMP()))
AND total_elapsed_time > 0
AND error_code IS NULL
AND partitions_scanned IS NOT NULL
ORDER BY total_elapsed_time desc
LIMIT 50;

This returns the following table:

Build the Streamlit Interface

Let’s create the interactive components of our application:

from snowflake.snowpark.context import get_active_session
import pandas as pd
import streamlit as st
import altair as alt
import numpy as np

st.title('Top n longest-running queries')

# Input widgets
col = st.columns(3)
with col[0]:
timeframe_option = st.selectbox('Select a timeframe', ('day', 'week', 'month'))
with col[1]:
limit_option = st.slider('Display n rows', 10, 200, 100)
with col[2]:
bin_option = st.slider('Bin size', 1, 30, 10)

Retrieve and Process Data

Next, we’ll implement the data fetching logic:

# Data retrieval
session = get_active_session()
df = session.sql(
f"""
SELECT query_id,
ROW_NUMBER() OVER(ORDER BY partitions_scanned DESC) AS query_id_int,
query_text,
total_elapsed_time/1000 AS query_execution_time_seconds,
partitions_scanned,
partitions_total,
FROM snowflake.account_usage.query_history Q
WHERE warehouse_name = 'CHANIN_XS' AND TO_DATE(Q.start_time) > DATEADD({timeframe_option},-1,TO_DATE(CURRENT_TIMESTAMP()))
AND total_elapsed_time > 0
AND error_code IS NULL
AND partitions_scanned IS NOT NULL
ORDER BY total_elapsed_time desc
LIMIT {limit_option};
"""
).to_pandas()

df = df[df['QUERY_TEXT'].str.lower().str.startswith(tuple(commands.lower() for commands in sql_command_option))]

Display Data Visualization

Finally, let’s add the data visualization components:

st.title('Histogram of Query Execution Times')

st.title('Histogram of Query Execution Times')

# Create a DataFrame for the histogram data
hist, bin_edges = np.histogram(df['QUERY_EXECUTION_TIME_SECONDS'], bins=bin_option)

histogram_df = pd.DataFrame({
'bin_start': bin_edges[:-1],
'bin_end': bin_edges[1:],
'count': hist
})
histogram_df['bin_label'] = histogram_df.apply(lambda row: f"{row['bin_start']:.2f} - {row['bin_end']:.2f}", axis=1)

# Create plots
histogram_plot = alt.Chart(histogram_df).mark_bar().encode(
x=alt.X('bin_label:N', sort=histogram_df['bin_label'].tolist(),
axis=alt.Axis(title='QUERY_EXECUTION_TIME_SECONDS', labelAngle=90)),
y=alt.Y('count:Q', axis=alt.Axis(title='Count')),
tooltip=['bin_label', 'count']
)

box_plot = alt.Chart(df).mark_boxplot(
extent="min-max",
color='yellow'
).encode(
alt.X("QUERY_EXECUTION_TIME_SECONDS:Q", scale=alt.Scale(zero=False))
).properties(
height=200
)

st.altair_chart(histogram_plot, use_container_width=True)
st.altair_chart(box_plot, use_container_width=True)


# Data display
with st.expander('Show data'):
st.dataframe(df)
with st.expander('Show summary statistics'):
st.write(df['QUERY_EXECUTION_TIME_SECONDS'].describe())

Putting all of these code snippets together, we can build out the interactive query performance insights app that looks like the following:

Conclusion

Big congratulations, you’ve successfully created an interactive query performance analysis tool that provides valuable insights into your query execution patterns. This application serves as a foundation that you can customize and extend based on your owns pecific needs. Consider adding features like query pattern analysis, cost estimation, or automated optimization recommendations to make it even more powerful for your use case.

Resources

Here are some additional resources if you need further information:

Videos:

Articles:

Documentation

--

--

Chanin Nantasenamat
Chanin Nantasenamat

Written by Chanin Nantasenamat

Data Professor on YouTube | Sr Developer Advocate | ex-Professor of Bioinformatics | Join https://data-professor.medium.com/membership

No responses yet