4 practical things you can do today to optimize cost in your Snowflake deployment

In the context of Snowflake, the term “cost optimization” can mean lots of different things to a host of different audiences. For some, the term may refer to writing optimized, performant queries that are also cost effective. For others, cost optimization may mean warehouse right-sizing, or making sure that the right virtual warehouse is used for the right job. These two examples, however, only scratch the surface, as cost optimization can cover a broad range of things across many of Snowflake’s features. No matter what meaning the term may take on for you, there is one interpretation of “cost optimization” that we can all likely agree on: how to lower your monthly Snowflake bill. In this article, I’ll cover a handful of practical things that you can do today to start optimizing costs in your Snowflake deployment.

How does Snowflake billing work?

Before we get started, let’s get a high level overview of how Snowflake billing works. The main components of Snowflake cost are compute, storage, and data transfer & egress. Here’s a brief summary describing each of these components:

  • Compute — This typically comprises most of an account’s monthly spend. It covers virtual warehouses, cloud services, and serverless functionalities. The great thing about this component of the billing model is that you will only ever pay for the compute you use.
  • Storage — Typically a very small percentage of the overall monthly bill. It covers data storage, and offers pass-through pricing, and up to 7x compression of data.
  • Data Transfer & Egress — Also typically a small fraction of the overall monthly bill. It covers data transfer across different cloud providers (COPY command, replication, and external functions).

With this knowledge, let’s dive into a few of the things you can do today to lower your costs. To follow along, you’ll need access to your Snowflake account’s usage schema, and either ACCOUNTADMIN privileges, or imported privileges on the SNOWFLAKE database and ACCOUNT_USAGE schema (i.e., SNOWFLAKE.ACCOUNT_USAGE).

Let’s get started!

Configure auto-suspend and auto-resume policies

The first thing you can do to start lowering your monthly bill today is to properly configure (or set) auto-suspend and auto-resume policies for your virtual warehouses. Remember that virtual warehouses are part of the “Compute” component of Snowflake’s billing model — the component that typically comprises the largest portion of your monthly bill.

Auto-suspend policies ensure that virtual warehouses are turned off quickly after they become idle — which saves you money! Auto-suspend is enabled by default for virtual warehouses, but you always double check the time setting and configure it accordingly based on your use case. On the flip side, you should also configure the auto-resume time setting for your virtual warehouses. This will ensure that the warehouse is turned back on when a query is run using that warehouse.

Here are some sample queries to help you quickly identify any warehouse where an auto-suspend or auto-resume policy is not set:

-- Auto Suspend
show warehouses;
select * from table(result_scan(last_query_id()))
where "auto_suspend" is null;
-- Auto Resume
show warehouses;
select * from table(result_scan(last_query_id()))
where "auto_resume" = false;

When set properly, these policies can be great for use cases that make heavy use of the results cache, like BI or analytics use cases. In addition, you may also want to consider leaving the warehouse on for a bit longer than average for these use cases.

Monitor warehouses or workloads that may be approaching cloud services billing threshold

“Cloud services billing” — what is that? Remember that the “Compute” component of the Snowflake billing model is made up of three parts, and “cloud services billing” is one of those parts. Cloud services refer to the compute used to perform services like user requests, logins, query displays, and more (SHOW commands, querying the results cache, etc.).

Accounts are charged only if cloud services make up more than 10% of a warehouse’s daily compute. Although this is rare, it is a good idea to monitor warehouses or workloads that may be approaching this threshold.

Here are a couple of queries you can use to help you keep an eye on things:

-- By Warehouse
with
cloudservices as (
select
warehouse_name
,month(start_time) as month
,sum(credits_used_cloud_services) as cloud_services_credits,
count(*) as no_querys
from
snowflake.account_usage.query_history
group by
warehouse_name
,month
order by
warehouse_name
,no_querys desc
),
warehousemetering as (
select
warehouse_name
,month(start_time) as month
,sum(credits_used) as credits_for_month
from
snowflake.account_usage.warehouse_metering_history
group by
warehouse_name
,month
order by
warehouse_name
,credits_for_month desc
)
select
*
,to_numeric(cloud_services_credits/nullif(credits_for_month,0)*100,10,2) as perct_cloud
from
cloudservices
join
warehousemetering using(warehouse_name,month)
order by
perct_cloud desc;
-- By Account
with
cloudservices as (
select
warehouse_name
,month(start_time) as month
,sum(credits_used_cloud_services) as cloud_services_credits,
count(*) as no_querys
from
snowflake.account_usage.query_history
group by
warehouse_name
,month
order by
warehouse_name
,no_querys desc
),
warehousemetering as (
select
warehouse_name
,month(start_time) as month
,sum(credits_used) as credits_for_month
from
snowflake.account_usage.warehouse_metering_history
group by
warehouse_name
,month
order by
warehouse_name
,credits_for_month desc
)
select
month
,sum(cloud_services_credits) as sum_cloud_services_credits
,sum(credits_for_month) as sum_credits_for_month
,to_numeric(sum_cloud_services_credits/nullif(sum_credits_for_month,0)*100,10,2) as perct_cloud
from
cloudservices
join
warehousemetering using(warehouse_name,month)
group by
month
order by
perct_cloud desc;

These queries will return a list of warehouses (or usage for an account) alongside the corresponding amount of cloud services credits used, credits for the month, and calculate a consumption percentage based on those two numbers. Remember that cloud services billing is reflected on the monthly bill only if cloud services make up more than 10% of a warehouse’s daily compute.

Set statement timeouts for accounts and warehouses

Not all query statements across users or warehouses in your Snowflake deployment will be efficient or performant, and a good portion of them will result in long-running queries, which consume compute credits and drive up cost.

You may want to consider setting statement timeouts on warehouses or user accounts (or both) to prevent queries from running unnecessarily long. When doing this, also consider the experience for the end user, and try to strike the right balance between the statement timeout setting and the needs of the end user — you may frustrate some end users if you set very low timeouts across an entire warehouse, or if you incorrectly set them for certain user account. Here’s a sample query to help you observe and set timeouts:

show parameters like 'STATE%' in account;

alter account set STATEMENT_TIMEOUT_IN_SECONDS = 900;

show parameters in account;

–- Where the timeout is not the default value you’ve set
select * from table(result_scan(last_query_id()))
where "value" <> "default";

–- At the warehouse level
alter warehouse load_wh set STATEMENT_TIMEOUT_IN_SECONDS = 1727;

show parameters in warehouse load_wh;

Drop unused tables

Why pay storage fees for tables that are never used? It’s very common to encounter tables that may have been used for a quick POC, or report, or for testing out a feature, and are never used again. Needless to say, if a table is not being used, you should consider dropping it. Here are some sample queries that can help you identify unused tables:

-- DML from the information schema to identify table sizes and last updated timestamps
select table_catalog || '.' || table_schema || '.' || table_name as table_path,
table_name, table_schema as schema,
table_catalog as database, bytes,
to_number(bytes / power(1024,3),10,2) as gb,
last_altered as last_use,
datediff('day',last_use,current_date) as days_since_last_use
from information_schema.tables
where days_since_last_use > 90 --use your days threshold
order by bytes desc;

-- Last DML on object
select (system$last_change_commit_time(
'database.schema.table_name')/1000)::timestamp_ntz;

-- Queries on object in last 90 days
select count(*) from snowflake.account_usage.query_history
where contains(upper(query_text),'table_name')
and datediff('day',start_time,current_date) < 90;

These queries do a few important things:

  • Return all unused tables by size and the last time they were updated (this can be based a threshold you set, in the example above it’s 90 days)
  • Return the last DML on a specific object
  • Return number of queries that include a given table name, in the last 90 days

Conclusion: Try it out!

There you have it — 4 practical things you can do today proactively monitor costs and defend against any unexpected billing in your Snowflake deployment. This is just the start, though. As mentioned earlier, cost optimization covers a broad range of things across Snowflake. Stay tuned for future blog posts that dive even further into topics like theoretical cost optimization frameworks, the ACCOUNT_USAGE schema, query optimizations, visualizing cost, and more. Try some of the queries out today and let us know what you think!

David Spezia and Samartha Chandrashekar from Snowflake contributed to the article. A big thanks!

--

--