How to Build an Interactive Query Performance App in Snowflake Notebooks
Have you found it challenging to identify performance bottlenecks in your queries? Efficient analysis and optimization of query performance can help alleviate unnecessary resource consumption and increased costs.
In this blog, I will guide you on how you can build a query performance analysis tool through the use of Streamlit’s interactive visualization capabilities.
Briefly, here’s what we’ll cover:
- Create interactive widgets for data filtering
- Implement statistical visualizations using Altair
- Generate performance insights from query history
- Build a user-friendly interface for data exploration
Getting Started
Before we dive into the implementation, ensure you have access to a Snowflake account.
To follow along with this blog, you can click on Build_an_Interactive_Query_Performance_App_with_Streamlit.ipynb to download the Notebook from GitHub.
Build the Interactive Query Performance App
Let’s break down the development process into manageable components that we’ll build out:
- Write the Base Query
- Build the Streamlit Interface
- Retrieve and Process Data
- Display Data Visualization
Write the Base Query
First, we’ll establish our foundation by writing a SQL query to fetch the query performance data. Note that you should replace CHANIN_XS
with your own warehouse name.
SELECT query_id,
ROW_NUMBER() OVER(ORDER BY partitions_scanned DESC) AS query_id_int,
query_text,
total_elapsed_time/1000 AS query_execution_time_seconds,
partitions_scanned,
partitions_total,
FROM snowflake.account_usage.query_history Q
WHERE warehouse_name = 'CHANIN_XS'
AND TO_DATE(Q.start_time) > DATEADD(day,-1,TO_DATE(CURRENT_TIMESTAMP()))
AND total_elapsed_time > 0
AND error_code IS NULL
AND partitions_scanned IS NOT NULL
ORDER BY total_elapsed_time desc
LIMIT 50;
This returns the following table:
Build the Streamlit Interface
Let’s create the interactive components of our application:
from snowflake.snowpark.context import get_active_session
import pandas as pd
import streamlit as st
import altair as alt
import numpy as np
st.title('Top n longest-running queries')
# Input widgets
col = st.columns(3)
with col[0]:
timeframe_option = st.selectbox('Select a timeframe', ('day', 'week', 'month'))
with col[1]:
limit_option = st.slider('Display n rows', 10, 200, 100)
with col[2]:
bin_option = st.slider('Bin size', 1, 30, 10)
Retrieve and Process Data
Next, we’ll implement the data fetching logic:
# Data retrieval
session = get_active_session()
df = session.sql(
f"""
SELECT query_id,
ROW_NUMBER() OVER(ORDER BY partitions_scanned DESC) AS query_id_int,
query_text,
total_elapsed_time/1000 AS query_execution_time_seconds,
partitions_scanned,
partitions_total,
FROM snowflake.account_usage.query_history Q
WHERE warehouse_name = 'CHANIN_XS' AND TO_DATE(Q.start_time) > DATEADD({timeframe_option},-1,TO_DATE(CURRENT_TIMESTAMP()))
AND total_elapsed_time > 0
AND error_code IS NULL
AND partitions_scanned IS NOT NULL
ORDER BY total_elapsed_time desc
LIMIT {limit_option};
"""
).to_pandas()
df = df[df['QUERY_TEXT'].str.lower().str.startswith(tuple(commands.lower() for commands in sql_command_option))]
Display Data Visualization
Finally, let’s add the data visualization components:
st.title('Histogram of Query Execution Times')
st.title('Histogram of Query Execution Times')
# Create a DataFrame for the histogram data
hist, bin_edges = np.histogram(df['QUERY_EXECUTION_TIME_SECONDS'], bins=bin_option)
histogram_df = pd.DataFrame({
'bin_start': bin_edges[:-1],
'bin_end': bin_edges[1:],
'count': hist
})
histogram_df['bin_label'] = histogram_df.apply(lambda row: f"{row['bin_start']:.2f} - {row['bin_end']:.2f}", axis=1)
# Create plots
histogram_plot = alt.Chart(histogram_df).mark_bar().encode(
x=alt.X('bin_label:N', sort=histogram_df['bin_label'].tolist(),
axis=alt.Axis(title='QUERY_EXECUTION_TIME_SECONDS', labelAngle=90)),
y=alt.Y('count:Q', axis=alt.Axis(title='Count')),
tooltip=['bin_label', 'count']
)
box_plot = alt.Chart(df).mark_boxplot(
extent="min-max",
color='yellow'
).encode(
alt.X("QUERY_EXECUTION_TIME_SECONDS:Q", scale=alt.Scale(zero=False))
).properties(
height=200
)
st.altair_chart(histogram_plot, use_container_width=True)
st.altair_chart(box_plot, use_container_width=True)
# Data display
with st.expander('Show data'):
st.dataframe(df)
with st.expander('Show summary statistics'):
st.write(df['QUERY_EXECUTION_TIME_SECONDS'].describe())
Putting all of these code snippets together, we can build out the interactive query performance insights app that looks like the following:
Conclusion
Big congratulations, you’ve successfully created an interactive query performance analysis tool that provides valuable insights into your query execution patterns. This application serves as a foundation that you can customize and extend based on your owns pecific needs. Consider adding features like query pattern analysis, cost estimation, or automated optimization recommendations to make it even more powerful for your use case.
Resources
Here are some additional resources if you need further information:
Videos:
- Build An Interactive Query Performance App In Snowflake Notebooks Using Streamlit
- YouTube playlist on Snowflake Notebooks
Articles:
Documentation