Behind the Scenes of Snowflake Healthcheck

Understanding Hashmap’s Latest Snowflake Utility

Published in

Hashmap, an NTT DATA Company

7 min readDec 3, 2020

Snowflake Healthcheck is the latest Snowflake utility developed by Hashmap. Healthcheck helps users measure and improve the health of their Snowflake environment. In this article, I’ll first give a brief demonstration of how to use Snowflake Healthcheck, then I’ll show you how we use Snowflake query results to build the dashboards used to display the Healthcheck data.

You’re likely using the Snowflake Data Cloud today to enable a wide range of data workloads including data warehousing, data lakes, data engineering, data science, and more. As a true SaaS service, Snowflake requires virtually no maintenance and operational overhead plus it’s consumption-based and very SQL friendly.

Based on the above and many other reasons, we continue to see significant market demand for Snowflake and as a Premier Services Partner, we strive to do our best to add value to the overall customer experience — that’s where Snowflake Healthcheck comes in.

Snowflake Healthcheck is an open-source React application. A Snowflake account with administrative privileges is the only requirement to use the tool.

How to Use the Tool

Go to Healthcheck

Click on the New Report button

Click on the Copy to Clipboard button to copy the Snowflake query

Login to your Snowflake account and navigate to Worksheets
Make a new worksheet
Paste the Snowflake query that you copied earlier into your new worksheet
Click Run (the query will take a few minutes to complete)

Download the query results as a .csv file

Upload the .csv containing the query results and click Visualize

Our Snowflake Query

The Snowflake query we created for Healthcheck gathers data from views found in the SNOWFLAKE database and the ACCOUNT_USAGE schema. These views are available to anyone with a Snowflake account that has administrative access.

Healthcheck Query for Snowflake

The three views that we query from are:

QUERY_HISTORY provides detailed records of all query history up to a year
WAREHOUSE_METERING_HISTORY provides records of credit usage for all warehouses in the Snowflake account
DATABASE_STORAGE_USAGE_HISTORY provides records of storage for all databases in the Snowflake account

Building the Charts

To visualize the query results, Snowflake Healthcheck leverages plotly.js. Before the .csv data can be visualized, it must first be parsed.

Parsing the Query Results

The Snowflake query we created allows us to easily read the results. The query results are generated as JSON objects where each row of the .csv file is an individual JSON object.

The individualJSON objects are formatted as seen below. Each object has a type key.

There are three possible type values:

warehouse_health

Warehouse Health Object Format

warehouse_usage

Warehouse Usage Object Format

database_usage

Database Usage Object Format

The type key is used for parsing the data because its value determines which chart component its data will be applied to.

updateData function

Once a user uploads their .csv file, the data is filtered based on the key type. This filtered data is used to update the state variables that are used by the chart components. Each of these state variables is an array and corresponds to a specific chart component.

warehouse_health_data: used in the Warehouse Health chart component
warehouse_usage_data: used in the Warehouse Compute Credits chart component
database_datasize_data: used in the Database Usage chart component

The data key of the JSON objects holds the specific key/value pairs that are used to make up the data for the specific chart.

The Charts

The above state variables are passed to the chart components as props. As of the writing of this article, Snowflake Healthcheck generates three charts.

Warehouse Health

Warehouse Health gives insight into the effect of past queries on the warehouse. It is a grouped bar chart with three traces that have corresponding values to each warehouse. The data used in this chart is gathered from the SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY view in Snowflake.

Warehouse Health Traces

This chart is generated by the data that is passed down in the warehouse_health_data prop variable which is formatted as follows. It is an array where each entry has a type and data key.

The data value is a JSON object that holds the following four keys:

WAREHOUSE: the entry’s Snowflake warehouse
MEDIAN_EXECUTION_TIME_MINUTES: the median time it took to execute the queries for the warehouse (last 30 days to present)
MEDIAN_QUEUED_PROVISIONING_TIME_MINUTES: the median time spent in the warehouse queue when the warehouse was provisioning servers (last 30 days to present)
MEDIAN_QUEUED_OVERLOAD_TIME_MINUTES: the median time spent in the warehouse queue when the warehouse was overloaded by the query workload (last 30 days to present)

The warehouse_health_data is iterated with a forEach loop. The three traces in the chart are generated from a key/value pair from warehouse_health_data.

For example, the Median Execution Time trace that can be seen in the chart comes from the MEDIAN_EXECUTION_TIME_MINUTES key in the warehouse_health_data.

Daily Warehouse Compute Credit Usage

Warehouse Compute Credits gives insight into how much the warehouse is being used. It is a line chart in which each warehouse’s daily credit consumption is tracked over a 30 day period. The data that is gathered to make up this chart comes from the SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY Snowflake view.

Warehouse Usage Traces

Warehouse Compute Credits chart component relies on the warehouse_usage_data prop variable to make visualizations. This prop variable is an array that is broken down as follows. Each entry in the array has a type and data key.

The value of data is a JSON object that holds the following three keys:

WAREHOUSE: the entry’s Snowflake warehouse
DATE: the date that the credits were used
COMPUTE_CREDITS: the number of credits used

In this stacked line chart, there is a trace for each warehouse. For each warehouse trace, the x-value is an array made up of the dates where credits were used in that warehouse. The y-value is the corresponding number of credits used on that date.

Daily Database Usage (in GB)

The Daily Database Usage (in GB) is a stacked line chart that shows the number of gigabytes of storage used by a specific database over a 30 day period. The data used to generate this chart comes from the SNOWFLAKE.ACCOUNT_USAGE.DATABASE_STORAGE_USAGE_HISTORY Snowflake view.

Database Usage Traces

The data for the Daily Database Usage chart component is passed down as the database_datasize_data prop variable. This prop variable is an array that is broken down as follows. Each entry in the array has a type and data key.

The data key value is a JSON object that holds the following three keys:

DATABASE: the entry’s Snowflake database
DATE: the date of the storage usage record
AVERAGE_DAILY_USAGE_GIGABYTES: the amount of storage data used (in GB)

In this stacked line chart, there is a trace for each database. For each database trace, the x-value is an array made up of the dates in which data was initially stored in that database. The y-value is the corresponding amount of storage used (in gigabytes) on that day.

Final Thoughts

Snowflake Healthcheck provides an in-depth visualization of the health of your Snowflake environment. The charts described above provide health and cost summaries of the warehouses and databases attached to your Snowflake account. Be sure to check out Snowflake Healthcheck! We are currently working on updates and plan to release new visualizations soon.

Ready to Accelerate Your Digital Transformation?

At Hashmap, we work with our clients to build better, together.

If you’d like additional assistance in this area, Hashmap offers a range of enablement workshops and consulting service packages as part of our consulting service offerings, and would be glad to work through your specifics in this area.

How does Snowflake’s platform compare to other data platforms? Our technical experts have implemented over 250 cloud/data projects in the last 3 years and conducted unbiased, detailed analyses across 34 business and technical dimensions, ranking each cloud data platform.

See How Snowflake Compares to Redshift Synapse Big Query & CDP | Hashmap

Are you trying to find how Snowflake compares to other cloud data platforms? We evaluated 34 technical and business…

www.hashmapinc.com

Behind the Scenes of Snowflake Healthcheck

Understanding Hashmap’s Latest Snowflake Utility

How to Use the Tool

Our Snowflake Query

Building the Charts

Parsing the Query Results

The Charts

Warehouse Health

Daily Warehouse Compute Credit Usage

Daily Database Usage (in GB)

Final Thoughts

Ready to Accelerate Your Digital Transformation?

See How Snowflake Compares to Redshift Synapse Big Query & CDP | Hashmap

Are you trying to find how Snowflake compares to other cloud data platforms? We evaluated 34 technical and business…

Other Tools and Content You Might Like

Snowflake Utilities & Accelerators | Do more with Snowflake | Hashmap

Try out all the Snowflake utilities that Hashmap has available and do more with Snowflake: Snowflake Inspector…

Automating a React App Hosted on AWS S3 — Snowflake Healthcheck

Part 1: Setting up an IAM Account and S3 Bucket on AWS

Hashmap Webinars & Meetups | On-Demand Videos

Check out Hashmap's On-Demand Videos for past webinars and meetups.

Hashmap on Tap | Hashmap Podcast

A rotating cast of Hashmap hosts and special guest explore different technologies from diverse perspectives while enjoying a drink of choice.

Written by Sam Kohlleffel