Supercharge Your Snowflake Billing Statement with Custom User Metrics

One simple trick can be the key to understanding your Snowflake compute spend

Randy Pitcher II
Hashmap, an NTT DATA Company
6 min readOct 5, 2020

--

Photo by eberhard grossgasteiger from Pexels

One of the most common questions I am asked by Snowflake users is some variation of the following:

Which Snowflake users consumed the most compute credits last month?

This feels like a pretty basic thing to ask about a service like Snowflake.

With simple, pay as you go pricing and unlimited, on-demand scaling, Snowflake makes it easy for you to use as many credits as you need to deliver analytics insights faster than ever before.

Additionally, Snowflake provides a wide array of out-of-the-box metrics that are painful or impossible to get on other warehousing solutions. This focus on monitoring is one of the top things that has kept me coming back to the platform.

Analytics abound in one of Hashmap’s internal Snowflake accounts!

Surely, with all of this data and insight at our fingertips, it should be simple to find our top-consuming user.

Shockingly, Snowflake’s usage UI provides no way to answer or even approximate an answer to this question.

Challenges with Analyzing Credit Usage

Let’s review how Snowflake charges for compute.

Snowflake charges for warehouse usage with compute credits. One credit is equal to running a single-node, XSMALL warehouse for 1 hour. Larger warehouses with more compute nodes cost more credits to run (1 credit per node). Simple and linear pricing makes it easy to know how your warehouses impact your monthly bill.

The challenge of taking this pricing model and applying it to users starts with the way that running warehouses can support multiple concurrent queries. So if 2 people submit a query to a single warehouse that each takes an hour to complete, how do you assign that credit usage to each user?

Further muddying things is Snowflake’s minimum billing time for a newly-resumed warehouse. All virtual warehouses will bill for a minimum of 60s when they are turned on. So if a user submits a query that takes 10s to run, do we assign the full run time of the warehouse to that single user’s query?

The pain is especially sharp when virtual warehouses are used for general-purpose activities, like Hashmap’s TRAINING_WH.

Looks like someone left the training warehouse on overnight.

Not only is it impossible to determine what was done with this warehouse when spikes occur, but narrowing down which users to reach out to about increases often becomes an exercise in hunting through the query history and sending impatient Slack messages.

Time for a New Measure

Instead of hyper-focusing on the attribution of credit usage to individual users, maybe we can change the question a little bit and measure something a little easier to compute in Snowflake.

Often what Snowflake administrators really want to know is how and why their Snowflake costs are changing. Warehouse-level monitoring is one component of this, but understanding the impact of end-user behavior is another necessary part of fully understanding Snowflake compute costs.

I then propose that we do not look at Compute Credits but at User Credit Equivalents.

To create a UCE, we head to the SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY table. This table contains enough information to construct an approximation of how a specific query from a specific user contributes to credit usage.

The core idea is to convert the WAREHOUSE_SIZE column into a WAREHOUSE_NODE_SIZE value. With this, we can find out how long a specific number of nodes were run to execute a query from a particular user.

From there, it’s simple ✨math✨ to get to a credit approximation:

Warehouse Nodes * query runtime (ms) * 1s/1000ms * 1m/60s * 1h/60m * 1credit/node/hour

Expressed in SQL, this looks like:

If you’re interested, the SQL for this transformation can be found here.

Finally, with this new approach, we’re able to examine weekly sums of User Credit Equivalent usage by the user below. This allows me to visually track if usage is up or down and determine which users are driving those changes.

If you squint, you can see pretty clearly that the spike in this week's usage is from yours truly.

I can also look over the whole time period to identify drivers of compute costs:

Me, myself, and the robots I use to run our Snowflake ecosystem.

Again, I want to reiterate that UCE is an estimate that does not capture the actual nuances of Snowflake usage that goes into your monthly bill. The table below illustrates how far off these values are from the actual monthly compute credits in our Snowflake environment over the last few months:

In general, I suspect that months with longer running queries on average are more accurate thanks to that Minimum Billing Time error.

If you were really desperate to distribute the actual cost of your Snowflake environment across individual users, I think there are probably worse ways than weighting that cost by the total UCE for that month and assigning proportional costs to each user.

Need Snowflake Cloud Data Warehousing and Migration Assistance?

If you’d like additional assistance in this area, Hashmap offers a range of enablement workshops and consulting service packages as part of our consulting service offerings, and would be glad to work through your specifics in this area.

How does Snowflake compare to other data warehouses? Our technical experts have implemented over 250 cloud/data projects in the last 3 years and conducted unbiased, detailed analyses across 34 business and technical dimensions, ranking each cloud data warehouse.

Also, you can catch Randy as a host on Hashmap on Tap, a podcast focused on all things data engineering and the cloud — available on Spotify, Apple Podcasts, Google Podcasts, and other popular audio apps.

Other Tools and Content You Might Like

Randy Pitcher is a Cloud and Data Engineer (and OKC-based Regional Technical Expert) with Hashmap providing Data, Cloud, IoT, and AI/ML solutions and consulting expertise across industries with a group of innovative technologists and domain experts accelerating high-value business outcomes for our customers.

Be sure and connect with Randy on LinkedIn and reach out for more perspectives and insight into accelerating your data-driven business outcomes or to schedule a hands-on workshop to help you go from Zero to Snowflake.

--

--