How to Build a Warehouse Utilization Dashboard in Snowflake Notebooks
Need ways to optimize your Snowflake warehouse costs effectively? The solution lies in gaining an understanding of the warehouse usage patterns as the first step toward tackling inefficient resource allocation and unnecessary expenses.
In this blog, we’ll walk you through the creation of a warehouse utilization dashboard that features an interactive heatmap visualization.
Briefly, you’ll learn how to :
- Query and analyze warehouse utilization data
- Build an interactive interface with Streamlit
- Create dynamic heatmap visualizations using Altair
Getting Started
Before we begin, ensure you have:
- Access to a Snowflake account
- Basic knowledge of SQL and Python
- Understanding of Snowflake warehouses
To follow along with this blog, you can click on Warehouse_Utilization_with_Streamlit.ipynb to download the Notebook from GitHub.
Build Warehouse Utilization Dashboard
Retrieve Warehouse Utilization Data
Let’s start by retrieving the warehouse utilization data. We’ll write a SQL query to fetch the required information:
SELECT
DATE(start_time) AS usage_date,
HOUR(start_time) AS hour_of_day,
warehouse_name,
avg_running,
avg_queued_load,
start_time,
end_time
FROM snowflake.account_usage.warehouse_load_history
WHERE start_time >= DATEADD(month, -1, CURRENT_TIMESTAMP())
ORDER BY warehouse_name, start_time;
Note: The above SQL cell is named sql_warehouse_data
.
The retrieved data looks like the following:
Now, let’s convert the SQL results to a Pandas DataFrame:
sql_warehouse_data.to_pandas()
Note: The above Python cell is named py_dataframe
.
Create the Interactive Interface
We’ll enhance user experience by building interactive components. Particularly, we’ll create the slider widget that will allow us to filter the data.
Finally, we’ll calculate the total warehouse load (`TOTAL_LOAD`) and format the hour display (`HOUR_DISPLAY`) for each record.
import pandas as pd
import streamlit as st
# Get data
df = py_dataframe.copy()
# Create date filter slider
days = st.slider('Select number of days to analyze',
min_value=10,
max_value=90,
value=30,
step=10)
# Filter data based on selected days and create a copy
latest_date = pd.to_datetime(df['USAGE_DATE']).max()
cutoff_date = latest_date - pd.Timedelta(days=days)
filtered_df = df[pd.to_datetime(df['USAGE_DATE']) > cutoff_date].copy()
# Prepare data and create heatmap
filtered_df['TOTAL_LOAD'] = filtered_df['AVG_RUNNING'] + filtered_df['AVG_QUEUED_LOAD']
filtered_df['HOUR_DISPLAY'] = filtered_df['HOUR_OF_DAY'].apply(lambda x: f"{x:02d}:00")
st.warning(f"You've selected {days} days to analyze!")
filtered_df
The interactive interface that we’ve created using Streamlit is shown below along with the filtered DataFrame:
Visualize Usage Patterns
Let’s create an informative heatmap visualization to display warehouse usage patterns.
The heatmap shows the warehouse usage pattern across different hours of the day. Color intensity represents the total load and interactive tooltips showing detailed metrics for each cell.
import altair as alt
import streamlit as st
chart = alt.Chart(filtered_df).mark_rect(
stroke='black',
strokeWidth=1
).encode(
x=alt.X('HOUR_DISPLAY:O',
title='Hour of Day',
axis=alt.Axis(
labels=True,
tickMinStep=1,
labelOverlap=False
)),
y=alt.Y('WAREHOUSE_NAME:N',
title='Warehouse Name',
axis=alt.Axis(
labels=True,
labelLimit=200,
tickMinStep=1,
labelOverlap=False,
labelPadding=10
)),
color=alt.Color('TOTAL_LOAD:Q', title='Total Load'),
tooltip=['WAREHOUSE_NAME', 'HOUR_DISPLAY', 'TOTAL_LOAD',
'AVG_RUNNING', 'AVG_QUEUED_LOAD']
).properties(
title=f'Warehouse Usage Patterns ({days} Days)'
).configure_view(
stroke=None,
continuousHeight=400
).configure_axis(
labelFontSize=10
)
# Display the chart
st.altair_chart(chart, use_container_width=True)
Here’s the heatmap displaying the warehouse usage patterns:
Conclusion
In this blog, we’ve built a powerful warehouse utilization dashboard that enables data-driven decision-making for warehouse optimization. You’ve learned how to query warehouse utilization data, create interactive components, and visualize usage patterns through heatmaps. This lays the foundation and can be extended to include additional metrics, different visualization types, or automated alerting systems based on your specific needs.
Resources
Videos:
- Creating a Heat Map In Snowflake Notebooks with Streamlit to Analyze Warehouse Utilization
- YouTube playlist on Snowflake Notebooks
Articles:
Documentation: