Behind the Scenes of Snowflake Healthcheck

Understanding Hashmap’s Latest Snowflake Utility

Sam Kohlleffel
Hashmap, an NTT DATA Company
7 min readDec 3, 2020

--

Snowflake Healthcheck is the latest Snowflake utility developed by Hashmap. Healthcheck helps users measure and improve the health of their Snowflake environment. In this article, I’ll first give a brief demonstration of how to use Snowflake Healthcheck, then I’ll show you how we use Snowflake query results to build the dashboards used to display the Healthcheck data.

You’re likely using the Snowflake Data Cloud today to enable a wide range of data workloads including data warehousing, data lakes, data engineering, data science, and more. As a true SaaS service, Snowflake requires virtually no maintenance and operational overhead plus it’s consumption-based and very SQL friendly.

Based on the above and many other reasons, we continue to see significant market demand for Snowflake and as a Premier Services Partner, we strive to do our best to add value to the overall customer experience — that’s where Snowflake Healthcheck comes in.

Snowflake Healthcheck is an open-source React application. A Snowflake account with administrative privileges is the only requirement to use the tool.

How to Use the Tool

  • Click on the New Report button
  • Click on the Copy to Clipboard button to copy the Snowflake query
  • Login to your Snowflake account and navigate to Worksheets
  • Make a new worksheet
  • Paste the Snowflake query that you copied earlier into your new worksheet
  • Click Run (the query will take a few minutes to complete)
  • Download the query results as a .csv file
  • Upload the .csv containing the query results and click Visualize

Our Snowflake Query

The Snowflake query we created for Healthcheck gathers data from views found in the SNOWFLAKE database and the ACCOUNT_USAGE schema. These views are available to anyone with a Snowflake account that has administrative access.

Healthcheck Query for Snowflake

The three views that we query from are:

  • QUERY_HISTORY provides detailed records of all query history up to a year
  • WAREHOUSE_METERING_HISTORY provides records of credit usage for all warehouses in the Snowflake account
  • DATABASE_STORAGE_USAGE_HISTORY provides records of storage for all databases in the Snowflake account

Building the Charts

To visualize the query results, Snowflake Healthcheck leverages plotly.js. Before the .csv data can be visualized, it must first be parsed.

Parsing the Query Results

The Snowflake query we created allows us to easily read the results. The query results are generated as JSON objects where each row of the .csv file is an individual JSON object.

The individualJSON objects are formatted as seen below. Each object has a type key.

There are three possible type values:

  • warehouse_health
Warehouse Health Object Format
  • warehouse_usage
Warehouse Usage Object Format
  • database_usage
Database Usage Object Format

The type key is used for parsing the data because its value determines which chart component its data will be applied to.

updateData function

Once a user uploads their .csv file, the data is filtered based on the key type. This filtered data is used to update the state variables that are used by the chart components. Each of these state variables is an array and corresponds to a specific chart component.

  1. warehouse_health_data: used in the Warehouse Health chart component
  2. warehouse_usage_data: used in the Warehouse Compute Credits chart component
  3. database_datasize_data: used in the Database Usage chart component

The data key of the JSON objects holds the specific key/value pairs that are used to make up the data for the specific chart.

The Charts

The above state variables are passed to the chart components as props. As of the writing of this article, Snowflake Healthcheck generates three charts.

Warehouse Health

Warehouse Health gives insight into the effect of past queries on the warehouse. It is a grouped bar chart with three traces that have corresponding values to each warehouse. The data used in this chart is gathered from the SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY view in Snowflake.

Warehouse Health Traces

This chart is generated by the data that is passed down in the warehouse_health_data prop variable which is formatted as follows. It is an array where each entry has a type and data key.

The data value is a JSON object that holds the following four keys:

  • WAREHOUSE: the entry’s Snowflake warehouse
  • MEDIAN_EXECUTION_TIME_MINUTES: the median time it took to execute the queries for the warehouse (last 30 days to present)
  • MEDIAN_QUEUED_PROVISIONING_TIME_MINUTES: the median time spent in the warehouse queue when the warehouse was provisioning servers (last 30 days to present)
  • MEDIAN_QUEUED_OVERLOAD_TIME_MINUTES: the median time spent in the warehouse queue when the warehouse was overloaded by the query workload (last 30 days to present)

The warehouse_health_data is iterated with a forEach loop. The three traces in the chart are generated from a key/value pair from warehouse_health_data.

For example, the Median Execution Time trace that can be seen in the chart comes from the MEDIAN_EXECUTION_TIME_MINUTES key in the warehouse_health_data.

Daily Warehouse Compute Credit Usage

Warehouse Compute Credits gives insight into how much the warehouse is being used. It is a line chart in which each warehouse’s daily credit consumption is tracked over a 30 day period. The data that is gathered to make up this chart comes from the SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY Snowflake view.

Warehouse Usage Traces

Warehouse Compute Credits chart component relies on the warehouse_usage_data prop variable to make visualizations. This prop variable is an array that is broken down as follows. Each entry in the array has a type and data key.

The value of data is a JSON object that holds the following three keys:

  • WAREHOUSE: the entry’s Snowflake warehouse
  • DATE: the date that the credits were used
  • COMPUTE_CREDITS: the number of credits used

In this stacked line chart, there is a trace for each warehouse. For each warehouse trace, the x-value is an array made up of the dates where credits were used in that warehouse. The y-value is the corresponding number of credits used on that date.

Daily Database Usage (in GB)

The Daily Database Usage (in GB) is a stacked line chart that shows the number of gigabytes of storage used by a specific database over a 30 day period. The data used to generate this chart comes from the SNOWFLAKE.ACCOUNT_USAGE.DATABASE_STORAGE_USAGE_HISTORY Snowflake view.

Database Usage Traces

The data for the Daily Database Usage chart component is passed down as the database_datasize_data prop variable. This prop variable is an array that is broken down as follows. Each entry in the array has a type and data key.

The data key value is a JSON object that holds the following three keys:

  • DATABASE: the entry’s Snowflake database
  • DATE: the date of the storage usage record
  • AVERAGE_DAILY_USAGE_GIGABYTES: the amount of storage data used (in GB)

In this stacked line chart, there is a trace for each database. For each database trace, the x-value is an array made up of the dates in which data was initially stored in that database. The y-value is the corresponding amount of storage used (in gigabytes) on that day.

Final Thoughts

Snowflake Healthcheck provides an in-depth visualization of the health of your Snowflake environment. The charts described above provide health and cost summaries of the warehouses and databases attached to your Snowflake account. Be sure to check out Snowflake Healthcheck! We are currently working on updates and plan to release new visualizations soon.

Ready to Accelerate Your Digital Transformation?

At Hashmap, we work with our clients to build better, together.

If you’d like additional assistance in this area, Hashmap offers a range of enablement workshops and consulting service packages as part of our consulting service offerings, and would be glad to work through your specifics in this area.

How does Snowflake’s platform compare to other data platforms? Our technical experts have implemented over 250 cloud/data projects in the last 3 years and conducted unbiased, detailed analyses across 34 business and technical dimensions, ranking each cloud data platform.

Other Tools and Content You Might Like

Sam Kohlleffel is in Hashmap’s RTE Internship program developing data and cloud applications and is also a student at Texas A&M University studying Economics, Math, and Statistics.

--

--