Behind the Scenes of Snowflake Healthcheck
Understanding Hashmap’s Latest Snowflake Utility
Snowflake Healthcheck is the latest Snowflake utility developed by Hashmap. Healthcheck helps users measure and improve the health of their Snowflake environment. In this article, I’ll first give a brief demonstration of how to use Snowflake Healthcheck, then I’ll show you how we use Snowflake query results to build the dashboards used to display the Healthcheck data.
You’re likely using the Snowflake Data Cloud today to enable a wide range of data workloads including data warehousing, data lakes, data engineering, data science, and more. As a true SaaS service, Snowflake requires virtually no maintenance and operational overhead plus it’s consumption-based and very SQL friendly.
Based on the above and many other reasons, we continue to see significant market demand for Snowflake and as a Premier Services Partner, we strive to do our best to add value to the overall customer experience — that’s where Snowflake Healthcheck comes in.
Snowflake Healthcheck is an open-source React application. A Snowflake account with administrative privileges is the only requirement to use the tool.
How to Use the Tool
- Go to Healthcheck
- Click on the New Report button
- Click on the Copy to Clipboard button to copy the Snowflake query
- Login to your Snowflake account and navigate to Worksheets
- Make a new worksheet
- Paste the Snowflake query that you copied earlier into your new worksheet
- Click Run (the query will take a few minutes to complete)
- Download the query results as a
.csv
file
- Upload the
.csv
containing the query results and click Visualize
Our Snowflake Query
The Snowflake query we created for Healthcheck gathers data from views found in the SNOWFLAKE
database and the ACCOUNT_USAGE
schema. These views are available to anyone with a Snowflake account that has administrative access.
The three views that we query from are:
QUERY_HISTORY
provides detailed records of all query history up to a yearWAREHOUSE_METERING_HISTORY
provides records of credit usage for all warehouses in the Snowflake accountDATABASE_STORAGE_USAGE_HISTORY
provides records of storage for all databases in the Snowflake account
Building the Charts
To visualize the query results, Snowflake Healthcheck leverages plotly.js. Before the .csv
data can be visualized, it must first be parsed.
Parsing the Query Results
The Snowflake query we created allows us to easily read the results. The query results are generated as JSON
objects where each row of the .csv
file is an individual JSON
object.
The individualJSON
objects are formatted as seen below. Each object has a type
key.
There are three possible type
values:
warehouse_health
warehouse_usage
database_usage
The type
key is used for parsing the data because its value determines which chart component its data will be applied to.
Once a user uploads their .csv
file, the data is filtered based on the key type
. This filtered data is used to update the state variables that are used by the chart components. Each of these state variables is an array and corresponds to a specific chart component.
warehouse_health_data
: used in the Warehouse Health chart componentwarehouse_usage_data
: used in the Warehouse Compute Credits chart componentdatabase_datasize_data
: used in the Database Usage chart component
The data
key of the JSON
objects holds the specific key/value pairs that are used to make up the data for the specific chart.
The Charts
The above state variables are passed to the chart components as props. As of the writing of this article, Snowflake Healthcheck generates three charts.
Warehouse Health
Warehouse Health gives insight into the effect of past queries on the warehouse. It is a grouped bar chart with three traces that have corresponding values to each warehouse. The data used in this chart is gathered from the SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
view in Snowflake.
This chart is generated by the data that is passed down in the warehouse_health_data
prop variable which is formatted as follows. It is an array where each entry has a type
and data
key.
The data
value is a JSON
object that holds the following four keys:
WAREHOUSE
: the entry’s Snowflake warehouseMEDIAN_EXECUTION_TIME_MINUTES
: the median time it took to execute the queries for the warehouse (last 30 days to present)MEDIAN_QUEUED_PROVISIONING_TIME_MINUTES
: the median time spent in the warehouse queue when the warehouse was provisioning servers (last 30 days to present)MEDIAN_QUEUED_OVERLOAD_TIME_MINUTES
: the median time spent in the warehouse queue when the warehouse was overloaded by the query workload (last 30 days to present)
The warehouse_health_data
is iterated with a forEach
loop. The three traces in the chart are generated from a key/value pair from warehouse_health_data
.
For example, the Median Execution Time
trace that can be seen in the chart comes from the MEDIAN_EXECUTION_TIME_MINUTES
key in the warehouse_health_data
.
Daily Warehouse Compute Credit Usage
Warehouse Compute Credits gives insight into how much the warehouse is being used. It is a line chart in which each warehouse’s daily credit consumption is tracked over a 30 day period. The data that is gathered to make up this chart comes from the SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY
Snowflake view.
Warehouse Compute Credits chart component relies on the warehouse_usage_data
prop variable to make visualizations. This prop variable is an array that is broken down as follows. Each entry in the array has a type
and data
key.
The value of data
is a JSON object that holds the following three keys:
WAREHOUSE
: the entry’s Snowflake warehouseDATE
: the date that the credits were usedCOMPUTE_CREDITS
: the number of credits used
In this stacked line chart, there is a trace for each warehouse. For each warehouse trace, the x-value
is an array made up of the dates where credits were used in that warehouse. The y-value
is the corresponding number of credits used on that date.
Daily Database Usage (in GB)
The Daily Database Usage (in GB) is a stacked line chart that shows the number of gigabytes of storage used by a specific database over a 30 day period. The data used to generate this chart comes from the SNOWFLAKE.ACCOUNT_USAGE.DATABASE_STORAGE_USAGE_HISTORY
Snowflake view.
The data for the Daily Database Usage chart component is passed down as the database_datasize_data
prop variable. This prop variable is an array that is broken down as follows. Each entry in the array has a type
and data
key.
The data
key value is a JSON object that holds the following three keys:
DATABASE
: the entry’s Snowflake databaseDATE
: the date of the storage usage recordAVERAGE_DAILY_USAGE_GIGABYTES
: the amount of storage data used (in GB)
In this stacked line chart, there is a trace for each database. For each database trace, the x-value
is an array made up of the dates in which data was initially stored in that database. The y-value
is the corresponding amount of storage used (in gigabytes) on that day.
Final Thoughts
Snowflake Healthcheck provides an in-depth visualization of the health of your Snowflake environment. The charts described above provide health and cost summaries of the warehouses and databases attached to your Snowflake account. Be sure to check out Snowflake Healthcheck! We are currently working on updates and plan to release new visualizations soon.
Ready to Accelerate Your Digital Transformation?
At Hashmap, we work with our clients to build better, together.
If you’d like additional assistance in this area, Hashmap offers a range of enablement workshops and consulting service packages as part of our consulting service offerings, and would be glad to work through your specifics in this area.
How does Snowflake’s platform compare to other data platforms? Our technical experts have implemented over 250 cloud/data projects in the last 3 years and conducted unbiased, detailed analyses across 34 business and technical dimensions, ranking each cloud data platform.
Other Tools and Content You Might Like
Sam Kohlleffel is in Hashmap’s RTE Internship program developing data and cloud applications and is also a student at Texas A&M University studying Economics, Math, and Statistics.