The fastest way to reduce Snowflake costs by 20%

Stop letting your warehouses run idle.

Sahil Singla
6 min readJan 3, 2024
A robot who likes saving Snowflake costs (Midjourney)

In a typical organization, the bulk of Snowflake costs are tied to compute usage. Snowflake charges compute based on the number of seconds that a particular warehouse is on. These costs start accruing from the moment a warehouse is turned on and continue until it is suspended. This inevitably leads to a situation where a meaningful fraction of your Snowflake bill is due to warehouses running idle between workloads.

While the exact number depends on how a warehouse is configured, idle costs are typically between 10%-25% of the total Snowflake compute spend. In this blog, I’ll explain why this number is so high, and how you can calculate it for your own warehouse. Finally, we’ll look at some effective strategies to significantly reduce idle costs.

Why do Snowflake Warehouses Run Idle?

Idle time in Snowflake warehouses mainly arises from two factors:

1. AUTO_SUSPEND Setting

This setting determines how long a warehouse stays active without queries before automatically suspending. For instance, with AUTO_SUSPEND set to 60 seconds, a warehouse idles for up to a minute after the last query before pausing.

2. Query Scheduling

The frequency and timing of queries also affect idle time. Infrequent or irregularly scheduled queries can leave warehouses running but unused, leading to higher idle times.

Let’s take two simple examples to understand how these parameters affect the total idle time of a warehouse, assuming that we have set the AUTO_SUSPEND parameter as 60 secs.

Case 1: A 67% idle warehouse with a sequential workload.

Case 1: Sequential Workload

  • Total Idle Time: 80 seconds (20s gap + 60s AUTO_SUSPEND)
  • Total Run Time: 120 seconds
  • Idle Percentage: 66.67% (80/120)
Case 2: A 60% idle warehouse with a parallel workload.

Case 2: Parallel Workload

  • Total Idle Time: 60 seconds (AUTO_SUSPEND only)
  • Total Run Time: 100 seconds
  • Idle Percentage: 60% (60/100)

Note: With longer running queries and well-scheduled workloads, idle time percentage would go down significantly.

Measuring the Cost of Idle Time

The idle time and associated costs for each warehouse can vary based on workload patterns. Run this SQL query to accurately measure these metrics for all your warehouses.

-- Setting the date range for the analysis
SET startDate = '2023-12-01';
SET endDate = '2023-12-31';

-- Retrieve the list of all warehouses for reference
SHOW WAREHOUSES;

-- Create a temporary table with warehouse information from the last executed query
WITH WAREHOUSE_INFO AS (
SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
),
-- Define a mapping of warehouse sizes to their respective credit consumption per hour
WAREHOUSE_CREDITS_MAP AS (
SELECT * FROM (
VALUES
-- Standard warehouses of varying sizes and their credit costs
('X-Small', 'STANDARD', 1),
('Small', 'STANDARD', 2),
('Medium', 'STANDARD', 4),
('Large', 'STANDARD', 8),
('X-Large', 'STANDARD', 16),
('2X-Large', 'STANDARD', 32),
('3X-Large', 'STANDARD', 64),
('4X-Large', 'STANDARD', 128),
('5X-Large', 'STANDARD', 256),
('6X-Large', 'STANDARD', 512),
-- Snowpark-Optimized warehouses and their credit costs
('Medium', 'SNOWPARK-OPTIMIZED', 6),
('Large', 'SNOWPARK-OPTIMIZED', 12),
('X-Large', 'SNOWPARK-OPTIMIZED', 24),
('2X-Large', 'SNOWPARK-OPTIMIZED', 48),
('3X-Large', 'SNOWPARK-OPTIMIZED', 96),
('4X-Large', 'SNOWPARK-OPTIMIZED', 192),
('5X-Large', 'SNOWPARK-OPTIMIZED', 384),
('6X-Large', 'SNOWPARK-OPTIMIZED', 768)
) as t (WAREHOUSE_SIZE, WAREHOUSE_TYPE, CREDITS_PER_HOUR)
),
-- Analyze query history to determine the effective start time of each query
QUERY_HISTORY_EFFECTIVE AS (
SELECT
QUERY_ID,
START_TIME,
-- Adjust start time by accounting for various delays
TIMEADD(
'millisecond',
QUEUED_OVERLOAD_TIME + COMPILATION_TIME
+ QUEUED_PROVISIONING_TIME + QUEUED_REPAIR_TIME
+ LIST_EXTERNAL_FILES_TIME,
START_TIME
) AS EXECUTION_START_TIME,
END_TIME,
WAREHOUSE_ID,
CLUSTER_NUMBER,
WAREHOUSE_NAME
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE START_TIME > $startDate AND END_TIME < $endDate AND WAREHOUSE_SIZE IS NOT NULL
),
-- Process query events to calculate the latest end time before the start of each query
QUERY_EVENTS AS (
SELECT
QUERY_ID,
START_TIME,
END_TIME,
EXECUTION_START_TIME,
WAREHOUSE_ID,
CLUSTER_NUMBER,
-- Calculate the max end time of previous queries for each warehouse cluster
MAX(END_TIME) OVER (
PARTITION BY WAREHOUSE_ID, CLUSTER_NUMBER
ORDER BY EXECUTION_START_TIME
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS LATEST_END_TIME_BEFORE_EXECUTION_START,
WAREHOUSE_NAME
FROM QUERY_HISTORY_EFFECTIVE
),
-- Computing the total run time for each query considering overlap with previous queries
QUERY_WAREHOUSE_CONTRIBUTION AS (
-- Conditions to calculate effective query run time
SELECT CASE
-- If no previous query overlaps, count the full run time of the current query
WHEN LATEST_END_TIME_BEFORE_EXECUTION_START IS NULL THEN TIMEDIFF('seconds', EXECUTION_START_TIME, END_TIME)
-- If the query starts after the previous queries have finished, count its full run time
WHEN EXECUTION_START_TIME >= LATEST_END_TIME_BEFORE_EXECUTION_START THEN TIMEDIFF('seconds', EXECUTION_START_TIME, END_TIME)
-- If the query starts before the previous query ends, count only the time after the previous query's end time
WHEN EXECUTION_START_TIME < LATEST_END_TIME_BEFORE_EXECUTION_START THEN GREATEST(0, TIMEDIFF('seconds', LATEST_END_TIME_BEFORE_EXECUTION_START, END_TIME))
END AS TOTAL_QUERY_RUN_TIME,
START_TIME,
EXECUTION_START_TIME,
END_TIME,
WAREHOUSE_ID,
CLUSTER_NUMBER,
LATEST_END_TIME_BEFORE_EXECUTION_START,
WAREHOUSE_NAME
FROM QUERY_EVENTS
),
-- Aggregate query events by warehouse ID and cluster number
QUERY_EVENTS_FINAL AS (
SELECT WAREHOUSE_ID, CLUSTER_NUMBER, SUM(TOTAL_QUERY_RUN_TIME) AS EFFECTIVE_RUN_TIME, MAX_BY(WAREHOUSE_NAME, START_TIME) AS WAREHOUSE_NAME
FROM QUERY_WAREHOUSE_CONTRIBUTION
GROUP BY WAREHOUSE_ID, CLUSTER_NUMBER
),
-- Enrich warehouse events with timestamps and categorize by activity type
WAREHOUSE_EVENTS_ENRICHED AS (
SELECT TIMESTAMP, LAG(TIMESTAMP, 1) OVER (
PARTITION BY WAREHOUSE_ID, CLUSTER_NUMBER
ORDER BY TIMESTAMP
) AS PREVIOUS_ACTIVITY_TIMESTAMP, WAREHOUSE_ID, CLUSTER_NUMBER, EVENT_NAME
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_EVENTS_HISTORY
WHERE (EVENT_NAME = 'SUSPEND_CLUSTER' OR EVENT_NAME = 'RESUME_CLUSTER') AND EVENT_STATE = 'COMPLETED'
AND TIMESTAMP BETWEEN $startDate AND $endDate
ORDER BY TIMESTAMP
),
-- Summarize warehouse run times based on suspend and resume events
WAREHOUSE_EVENTS_FINAL AS (
SELECT WAREHOUSE_ID, CLUSTER_NUMBER, SUM(TIMEDIFF('seconds', PREVIOUS_ACTIVITY_TIMESTAMP, TIMESTAMP)) AS WAREHOUSE_RUN_TIME
FROM WAREHOUSE_EVENTS_ENRICHED
WHERE EVENT_NAME = 'SUSPEND_CLUSTER'
GROUP BY WAREHOUSE_ID, CLUSTER_NUMBER
),
-- Calculate the utilization efficiency of each warehouse cluster
UTILIZATION_EFFICIENCY AS (
SELECT WAREHOUSE_ID, CLUSTER_NUMBER, WAREHOUSE_NAME, EFFECTIVE_RUN_TIME, WAREHOUSE_RUN_TIME, DIV0(EFFECTIVE_RUN_TIME, WAREHOUSE_RUN_TIME) * 100 AS UTILIZATION
FROM QUERY_EVENTS_FINAL
JOIN WAREHOUSE_EVENTS_FINAL USING (WAREHOUSE_ID, CLUSTER_NUMBER)
),
-- Fetch the latest effective rate for spend computation
EFFECTIVE_RATE_ENTRY AS (
SELECT EFFECTIVE_RATE, CURRENCY FROM
SNOWFLAKE.ORGANIZATION_USAGE.RATE_SHEET_DAILY
WHERE USAGE_TYPE = 'compute'
ORDER BY date DESC
LIMIT 1
)
-- Calculate total spend and potential savings for each warehouse cluster
-- Total spend is computed based on warehouse run time and credit consumption
-- Potential savings estimate the cost savings at 100% utilization
SELECT WAREHOUSE_ID, CLUSTER_NUMBER, WAREHOUSE_NAME, WAREHOUSE_SIZE, EFFECTIVE_RUN_TIME, WAREHOUSE_RUN_TIME, UTILIZATION,
EFFECTIVE_RATE * (WAREHOUSE_RUN_TIME * WAREHOUSE_CREDITS_MAP.credits_per_hour) /3600 AS TOTAL_SPEND,
TOTAL_SPEND - EFFECTIVE_RATE * (EFFECTIVE_RUN_TIME * WAREHOUSE_CREDITS_MAP.credits_per_hour) /3600 AS POTENTIAL_SAVINGS, CURRENCY
FROM UTILIZATION_EFFICIENCY
JOIN WAREHOUSE_INFO ON UTILIZATION_EFFICIENCY.WAREHOUSE_NAME = WAREHOUSE_INFO."name"
JOIN WAREHOUSE_CREDITS_MAP ON WAREHOUSE_CREDITS_MAP.WAREHOUSE_SIZE = WAREHOUSE_INFO."size" AND WAREHOUSE_CREDITS_MAP.WAREHOUSE_TYPE = WAREHOUSE_INFO."type"
CROSS JOIN EFFECTIVE_RATE_ENTRY

Strategies to Reduce Warehouse Idle Time

There are a couple of ways you could go about saving on idle time costs.

  1. Setting a low value for AUTO_SUSPEND
  • Standard Setting: The default AUTO_SUSPEND setting in Snowflake is 10 minutes.
  • Recommended Modification: Lowering it to about 1 minute aligns with Snowflake’s minimum billing time, reducing costs from idle warehouses.
  • Lower Bound: Setting AUTO_SUSPEND below 60 seconds isn’t typically beneficial. Snowflake’s background service which checks for warehouse idleness operates approximately every 30 seconds. Therefore, the lowest practical AUTO_SUSPEND time is around 30 seconds, as any duration shorter won’t be effectively recognized by the system.

2. Implementing a suspension service

Adjusting the AUTO_SUSPEND interval is a straightforward solution that can reduce idle costs. But you would still end up paying for that 1 minute of idle time, each time the warehouse is out of workloads. A more effective approach involves:

  • Creating a Custom Suspension Service: This service monitors warehouse activity, triggering a manual suspension when a warehouse is idle. It respects the minimum 60-second billing period.
  • Suspension Logic: The service suspends the warehouse at the later of two moments — 60 seconds after startup or immediately when it becomes idle.

This method ensures efficient use of warehouses without prematurely suspending them, optimizing costs. However, it’s important to consider the effort it takes to build and maintain a custom service.

The Suspension Trade-off: Losing the local cache

Suspending a Snowflake warehouse will save costs if no queries occur during the suspension, but it can also slow down query responses if unexpected queries arise while the warehouse is suspended. This is because a restarted warehouse takes time to load necessary data into its local cache. Therefore, the decision to suspend should ideally be based on predicting idle times to balance cost savings against potential delays in query processing. This trade-off calculation is not straightforward as one might expect, and sophisticated ML models are required to make these real-time decisions.

At Baselit, we are constantly finding ways to help companies lower their Snowflake costs. Sign up to our weekly newsletter, where we discuss our latest learnings on Snowflake optimization.

--

--

Sahil Singla

Co-founder at Baselit (YC W23) | Ex-Data Engineer @ Apple