Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Best practices, tips & tricks from Snowflake experts and community

Image generated using flux-softserve-anime on Replicate by Author.

How to Build a Query Cost Monitoring Tool in Snowflake Notebooks

5 min readJan 29, 2025

--

Are you struggling to manage and optimize your database compute costs effectively? If so, understanding the detailed breakdown of these costs per query might provide invaluable insights.

This blog introduces a robust solution by leveraging Snowflake’s account usage data to monitor and visualize compute costs associated with individual queries. Here, you’ll learn how to create an interactive dashboard for detailed cost analysis.

By the end of this blog, readers will be able to:

  • Retrieve and merge query data from Snowflake.
  • Convert this data into a usable format with Pandas.
  • Implement interactive elements using Streamlit.
  • Visualize data using Altair charts.

Getting Started

Before you can build your Query Cost Monitoring tool, ensure you have access to a Snowflake account. This tool requires basic knowledge of SQL and Python and familiarity with Pandas and Streamlit libraries.

Let’s start by setting up our environment and retrieving necessary data. Initially, you should download the required notebook from the query cost monitoring GitHub repository. Make sure your notebook environment includes essential libraries such as numpy, pandas, altair and streamlit .

Retrieving Query Cost Data

Write the SQL Query

Our first task is to write a SQL query that fetches credits_used and other relevant data from Snowflake's account_usage view. It’s also worthy to note that this SQL cell is named sql_data that we’ll use shortly.

SELECT
query_history.query_id,
query_history.query_text,
query_history.start_time,
query_history.end_time,
query_history.user_name,
query_history.database_name,
query_history.schema_name,
query_history.warehouse_name,
query_history.warehouse_size,
metering_history.credits_used,
execution_time/1000 as execution_time_s
FROM
snowflake.account_usage.query_history
JOIN snowflake.account_usage.metering_history ON query_history.start_time >= metering_history.start_time
AND query_history.end_time <= metering_history.end_time
WHERE
query_history.start_time >= DATEADD(DAY, -7, CURRENT_TIMESTAMP())
ORDER BY
query_history.query_id;

Convert Table to a DataFrame

Next, convert the SQL result into a Pandas DataFrame for further processing and analysis in our forthcoming Python code.

sql_data.to_pandas()

Create an Interactive Dashboard

Develop Interactive Components

Conceptually, we’ll build dynamic widgets using Streamlit to allow users to filter data based on different parameters such as time span, warehouse name, warehouse size, or user name.

First, we’ll start with importing the data and laying out the structure of the data app via st.columns() into 3 columns where each will be occupied by input widgets namely the slider and drop-down selection box.

import pandas as pd
import streamlit as st
import altair as alt

# Get data
df = py_dataframe.copy()

# Create date filter slider
st.subheader("Select time duration")

col = st.columns(3)

with col[0]:
days = st.slider('Select number of days to analyze',
min_value=1,
max_value=7,
value=7,
step=1)
with col[1]:
var = st.selectbox("Select a variable", ['WAREHOUSE_NAME', 'USER_NAME', 'WAREHOUSE_SIZE'])
with col[2]:
metric = st.selectbox("Select a metric", ["COUNT", "TOTAL_CREDITS_USED"])

Next, we’ll filter the data by applying the aforementioned user-specified widget values.

# Filter data according to day duration
df['START_TIME'] = pd.to_datetime(df['START_TIME'])
latest_date = df['START_TIME'].max()
cutoff_date = latest_date - pd.Timedelta(days=days)
filtered_df = df[df['START_TIME'] > cutoff_date].copy()

# Prepare data for heatmap
filtered_df['HOUR_OF_DAY'] = filtered_df['START_TIME'].dt.hour
filtered_df['HOUR_DISPLAY'] = filtered_df['HOUR_OF_DAY'].apply(lambda x: f"{x:02d}:00")

# Calculate frequency count and sum of credits by hour and query
agg_df = (filtered_df.groupby(['QUERY_ID', 'HOUR_DISPLAY', var])
.agg(
COUNT=('QUERY_ID', 'size'),
TOTAL_CREDITS_USED=('CREDITS_USED', 'sum')
)
.reset_index()
)

st.warning(f"Analyzing {var} data for the last {days} days!")

Here, we’re defining session states for the buttons that will serve as the logic for what happens when buttons are clicked. This will allow us to display the underlying DataFrames of the filtered and processed data.

# Initialize the button state in session state
if 'expanded_btn' not in st.session_state:
st.session_state.expanded_btn = False

# Callback function to toggle the state
def toggle_expand():
st.session_state.expanded_btn = not st.session_state.expanded_btn

# Create button with callback
st.button(
'⊕ Expand DataFrames' if not st.session_state.expanded_btn else '⊖ Collapse DataFrames',
on_click=toggle_expand,
type='secondary' if st.session_state.expanded_btn else 'primary'
)

# State conditional
if st.session_state.expanded_btn:
expand_value = True
else:
expand_value = False

with st.expander("See Filtered DataFrame", expanded=expand_value):
st.dataframe(filtered_df.head(100))
with st.expander("See Heatmap DataFrame", expanded=expand_value):
st.dataframe(agg_df)

The above code snippets will generate the following interactive app right inside Snowflake Notebooks:

Implement Data Visualizations

Here, we’ll use Altair to create insightful visualizations such as heatmaps, stacked bar charts, and bubble plots to allow visual detection of trends.

## Heatmap
heatmap = alt.Chart(agg_df).mark_rect(stroke='black', strokeWidth=1).encode(
x='HOUR_DISPLAY:O',
y=alt.Y(f'{var}:N',
title='',
axis=alt.Axis(
labels=True,
labelLimit=250,
tickMinStep=1,
labelOverlap=False,
labelPadding=10
)),
color=f'{metric}:Q',
tooltip=['HOUR_DISPLAY', var, metric]
).properties(
title=f'Query Activity Heatmap by Hour and {var}'
)

st.altair_chart(heatmap, use_container_width=True)

## Stacked bar chart with time series
bar_time = alt.Chart(agg_df).mark_bar().encode(
x='HOUR_DISPLAY:O',
y=f'{metric}:Q',
color=alt.Color(f'{var}:N', legend=alt.Legend(orient='bottom')),
tooltip=['HOUR_DISPLAY', var, metric]
).properties(
title=f'Query Activity by Hour and {var}',
height=400
)
st.altair_chart(bar_time, use_container_width=True)

## Bubble plot with size representing the metric
bubble = alt.Chart(agg_df).mark_circle().encode(
x='HOUR_DISPLAY:O',
y=alt.Y(f'{var}:N', title=''),
size=alt.Size(f'{metric}:Q', legend=alt.Legend(title='Query Count')),
color=alt.Color(f'{var}:N', legend=None),
tooltip=['HOUR_DISPLAY', var, metric]
).properties(
title=f'Query Distribution by Hour and {var}',
height=550
)
st.altair_chart(bubble, use_container_width=True)

Let’s now see how the generated data visualizations look like.

Starting with the heatmap:

On to the stacked bar chart:

And finally the bubble plots:

Conclusion

Congratulations on successfully developing a query cost monitoring tool! The interactive dashboard that we’ve just created allow users to analyze query patterns, identify cost drivers, and make data-driven decisions for optimization. You can further adapt this dashboard by adding additional metrics, implementing additional visualizations, etc.

Resources

Videos:

Articles:

Documentation:

Additional Reading:

Happy coding!

--

--

Chanin Nantasenamat
Chanin Nantasenamat

Written by Chanin Nantasenamat

Data Professor on YouTube | Sr Developer Advocate | ex-Professor of Bioinformatics | Join https://data-professor.medium.com/membership

No responses yet