Using Snowflake’s scale-to-zero capabilities for fun & profit

Customers love Snowflake’s differentiated capability to “scale to zero” with no compute cost when idle and quickly scale up & out to execute thousands of SQL/Python/Scala jobs across multiple clusters near-instantly. The same is true for being able to scale persistent storage and compute independently with industry-leading performance thanks to sophisticated built-in caching capabilities.

This post first provides a behind-the-scenes view of how Snowflake achieves instant-on, scale-to-zero, and caching. It then provides actionable guidance to optimize spend by effectively using these capabilities.

Virtual Warehouses — a highly differentiated compute abstraction

Snowflake decouples compute from persistent storage to ensure that each can be scaled independently of the other.

Snowflake exposes the abstraction of “virtual warehouses” to run customer jobs. These virtual warehouses are Massively Parallel Processing (MPP) compute clusters composed of multiple Virtual Machines (VMs) provisioned by Snowflake and run software managed by Snowflake. To realize a consistent customer experience across the multiple cloud infrastructure providers, Virtual Machines are used as the atomic unit of compute since they provide the lowest common denominator of functionality across cloud providers.

VMs that comprise a warehouse cluster are interconnected to share data during job execution. You never interact directly with the individual VMs, but only via virtual warehouse abstraction. The numbers and type of VMs that make up a virtual warehouse, the details of the software running in it, and related management are abstracted away from you — this allows us to evolve the compute capabilities and their pricing without compromising the simplicity of the user experience.

Each virtual warehouse includes independent compute clusters that do not share resources with other virtual warehouses. VMs that comprise a Virtual Warehouse are dedicated solely to that Virtual Warehouse which results in strong performance, resource, and security isolation. Each job runs on exactly one Virtual Warehouse.

Usage based pricing in virtual warehouses

Virtual warehouses consume billable compute as Snowflake credits on a per-second granularity as they execute jobs, load data, and perform other DML operations. Your spend is a function of the size of the virtual warehouse and how long it runs. Virtual warehouses are only billed while they are started; when suspended, they are not billed.

The following queries show the total credit consumption for each warehouse over a specific time period. This helps identify warehouses that are consuming more credits than others and specific warehouses that are consuming more credits than anticipated.

// Credits used (all time = past year)
SELECT WAREHOUSE_NAME
,SUM(CREDITS_USED_COMPUTE) AS CREDITS_USED_COMPUTE_SUM
FROM ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY
GROUP BY 1
ORDER BY 2 DESC
;

// Credits used (past N days/weeks/months)
SELECT WAREHOUSE_NAME
,SUM(CREDITS_USED_COMPUTE) AS CREDITS_USED_COMPUTE_SUM
FROM ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY
WHERE START_TIME >= DATEADD(DAY, -m, CURRENT_TIMESTAMP()) // Past m days
GROUP BY 1
ORDER BY 2 DESC
;

Elasticity of virtual warehouses

Virtual Warehouses are stateless compute resources. They can be created, destroyed, or resized at any point in an on-demand manner with no effect on the state of the data. By default, Virtual Warehouses are automatically provisioned when idle if a job is issued. Similarly, by default, the resources associated with a Virtual Warehouse are automatically released when there are no pending jobs. This elasticity allows you to dynamically match compute resources to usage demands, independent of data volume.

When a job is submitted, each VM in the respective Virtual Warehouse cluster spawns a new process that lives only for the duration of its job. Any failures of these processes are quickly resolved by automatic retries.

Each user can have multiple Virtual Warehouses running at any given time, and each Virtual Warehouse may be running multiple concurrent jobs. Every Virtual Warehouse has access to the same shared tables, without the need to maintain its own copy of the data. Since virtual warehouses are private compute resources that operate on shared data, interference of different workloads from different organizational units is avoided. This makes it common for you to have several Virtual Warehouses for jobs from different organizational units, often running continuously.

The elasticity of virtual warehouses is one of the most significant benefits and differentiators of the Snowflake architecture.

Free pools of running VMs to realize an “instant-on” experience

Virtual Machines (VMs) provide the requisite multi-tenant security and resource isolation since each VM runs in an isolated environment with its own virtual hardware, page tables, and kernel. However, since VMs are slow to start up (order of tens of seconds), cold starting them is not conducive to realize an instant-on experience. To work around this limitation, we use free pools of pre-started running VMs to sidestep the slow VM launch times and ameliorate any possible capacity challenges issues in (usually smaller) cloud regions lacking liquidity of VMs used by Snowflake.

Snowflake teams are bound by a demanding internal Service Level Objective (SLO) on lifecycle operations (such as start, stop, resume, suspend, scaling, etc.) to ensure an instant-on experience for customers. Aberrations from the SLO trigger on-call incident management workflows, post-incident analyses, and action items that are prioritized.

When user requests come in, running VMs in the free pool can be immediately assigned to virtual warehouses. However, to ensure efficient utilization of running VMs, accurate estimation of demand to size free pools optimally is important.

A predictive mechanism is used to size free pools which accounts for baseline demand as well as spikes that usually occur at the top-of-the-hour. Predictions are based on the historical demand for the same time slot in previous weeks while also accounting for prefetch duration to ensure timely VM provisioning and filtering of outliers due to one-off events to protect against over/under provisioning of free pool capacity.

Storage and caching in Snowflake

In addition to metadata (such as object catalogs, mappings from database tables to corresponding files in persistent storage, statistics, transaction logs, locks, etc.), two forms of state are used in Snowflake:

1/ Persistent storage of tables

Since tables are long-lived and may be read by many queries concurrently, persistent storage for tables requires strong durability and availability guarantees.

Remote, disaggregated, persistent storage such as Amazon S3 (or equivalent) is used because of its elasticity, high availability, and durability, despite relatively modest latency and throughput. Object storage solutions on cloud providers (such as Amazon S3 or equivalent) support storing immutable files, because of which they can only be overwritten in full. Snowflake takes advantage of cloud object storage support for reading parts of a file by partitioning tables horizontally into large, immutable files. Within each file, the values of each individual attribute or column are grouped together and compressed. Each file has a header that stores the offset of each column within the file, enabling the use of cloud object storage’s partial read functionality to only read columns needed for query execution.

2/ Intermediate data generated by query operators (e.g., joins) and consumed during executing queries

Since intermediate data is short-lived, low-latency & high-throughput access to prevent execution slowdown is preferred over strong durability guarantees. The SSD and main memory on virtual warehouse VMs provide ephemeral storage with the requisite latency and throughput performance needed for job execution; this is cleared when a virtual warehouse is suspended or stopped. This ephemeral storage layer is a write-through “cache” for persistent data to keep cached persistent files consistent and reduce the network load caused by compute-storage disaggregation.

Each virtual warehouse cluster has its own independent distributed ephemeral storage, which is used only by queries it runs. All virtual warehouses in the same account have access to the same shared tables via the remote persistent store and do not need to copy data from one virtual warehouse to another.

Disaggregation of compute and persistent storage enables independent scaling of both

Spilling as a means to simplify memory management
When execution artifacts perform write operations of intermediate data, first, main memory on the same VM is used; if this memory is full, data is “spilled” to local disk/SSD; when local disk is full, data spills to remote persistent storage (object storage such as Amazon S3). This scheme removes the need to handle out-of-memory or out-of-disk errors.

Caching strategy

When the demands of intermediate data are low, the ephemeral storage layer is also used to cache frequently accessed persistent data files (as a lesser priority) using a “lazy” consistent hashing over file names and an LRU eviction policy. Each file is cached only on the virtual warehouse VM to which it consistently hashes within the cluster. This scheme enables locality without reshuffling cached data (which can interfere with ongoing jobs) when VMs are added/removed to virtual warehouse clusters during resize or scaling operations — achieved by using the copy of data remote persistent storage, which also ensures eventual consistency of data in the ephemeral storage layer.

Since each persistent storage file is cached on a specific VM (for a fixed virtual warehouse size), operations on a persistent storage file are scheduled to the VM on which its file consistently hashes to. Since jobs are scheduled for cache locality, job parallelism is tightly coupled with the consistent hashing of files.

To avoid imbalanced partitions and overloading of a few virtual warehouse VMs due to hashing based on file names, work is load-balanced across VMs in the same cluster based on whether the expected completion time of an execution artifact is lower on a different VM. When execution artifacts are moved, files needed to execute the operation are read from persistent storage to avoid increasing load on already overloaded straggler VMs where the execution was initially scheduled. This also helps when some VMs may be executing much slower than others (due to a number of factors including cloud provider virtualization issues, network contention, etc.).

Snowflake’s scheduling logic attempts to find a balance between co-locating execution artifacts with cached persistent storage versus placing all execution artifacts on fewer virtual warehouse VMs. While the former minimizes network traffic for reading persistent data, it comes with the risk of increased network traffic for intermediate data exchange since jobs may be scheduled on all VMs of the virtual warehouse cluster. The latter obviates the need for network transfers for intermediate data exchange but can increase network traffic for persistent storage reads.

Real-world data has shown that the amount of data written to ephemeral storage is in the same order of magnitude as the amount of data written to remote persistent storage. Despite the size of ephemeral storage capacity being significantly lower (~0.1% on average) than the size of persistent storage, our data shows that Snowflake’s caching scheme enables average cache hit rates of ~80% for read-only queries and ~60% for read-write queries.

Managing spend efficiency through effective use of scale-to-zero and caching

Using scale-to-zero via auto-suspend and auto-resume
A warehouse can be set to automatically resume or suspend, based on activity to only consume resources based on actual usage. By default, Snowflake automatically suspends the warehouse if it is inactive for a specified period of time. Also, by default, Snowflake automatically resumes the warehouse when any job arrives at the warehouse. Auto-suspend and auto-resume behaviors apply to the entire warehouse and not to the individual clusters in the warehouse.

The following query identifies all warehouses that do not have auto-suspend enabled. Enabling auto-suspend ensures that warehouses suspend after a specific amount of inactive time in order to prevent runaway costs.

SHOW WAREHOUSES
;
SELECT "name" AS WAREHOUSE_NAME
,"size" AS WAREHOUSE_SIZE
FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
WHERE IFNULL("auto_suspend",0) = 0
;

The following query identifies all warehouses that do not have auto-resume enabled. Enabling auto-resume automatically resumes a warehouse when queries are submitted against it.

SHOW WAREHOUSES
;
SELECT "name" AS WAREHOUSE_NAME
,"size" AS WAREHOUSE_SIZE
FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
WHERE "auto_resume" = 'false'
;

Managing auto-suspend duration

We recommend setting the auto-suspend duration based on the ability of the workload to take advantage of caching at the ephemeral storage layer detailed earlier. This exercise involves finding the sweet spot for cost efficiency by balancing across the benefits of compute cost savings by suspending compute quickly and the price-performance benefits of Snowflake’s sophisticated caching. In general, our high level directional guidance is as follows:

  • For tasks, loading, and ETL/ELT use cases, immediate suspension of virtual warehouses is likely to be the optimal choice.
  • For BI and SELECT use cases, query warehouses are likely to be cost-optimal with ~10 minutes for suspension to keep data caches warm for end users.
  • For DevOps, DataOps, and Data Science use cases, warehouses are usually cost-optimal at ~5 minutes for suspension as a warm cache is not as important for ad-hoc & highly unique queries.

We recommend experimenting with these settings for stable workloads to achieve optimal price-performance.

The following query flags potential optimization opportunities by identifying virtual warehouses with an egregiously long duration for automatic suspension after a period of no activity on that warehouse.

SHOW WAREHOUSES
;
SELECT "name" AS WAREHOUSE_NAME
,"size" AS WAREHOUSE_SIZE
FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
WHERE "auto_suspend" >= 3600 // 3600 seconds = 1 hour
;

The following statement timeouts provide additional controls around how long a query is able to run before canceling it. This can help ensure that any queries that get hung up for extended periods of time will not cause excessive consumption of credits and also prevent the virtual warehouse from being suspended. This parameter is set at the account level by default and can also be optionally set at both the warehouse and user levels.

SHOW PARAMETERS LIKE 'STATEMENT_TIMEOUT_IN_SECONDS' IN ACCOUNT;
SHOW PARAMETERS LIKE 'STATEMENT_TIMEOUT_IN_SECONDS' IN WAREHOUSE <warehouse-name>;
SHOW PARAMETERS LIKE 'STATEMENT_TIMEOUT_IN_SECONDS' IN USER <username>;

The following query aggregates the percentage of data scanned from the ephemeral storage layer (cache) across all queries broken out by warehouse. Warehouses running querying/reporting workloads with a low percentage of data returned from caches indicate optimization opportunities (because warehouses may be suspending too quickly).

SELECT WAREHOUSE_NAME
,COUNT(*) AS QUERY_COUNT
,SUM(BYTES_SCANNED) AS BYTES_SCANNED
,SUM(BYTES_SCANNED*PERCENTAGE_SCANNED_FROM_CACHE) AS BYTES_SCANNED_FROM_CACHE
,SUM(BYTES_SCANNED*PERCENTAGE_SCANNED_FROM_CACHE) / SUM(BYTES_SCANNED) AS PERCENT_SCANNED_FROM_CACHE
FROM "SNOWFLAKE"."ACCOUNT_USAGE"."QUERY_HISTORY"
WHERE START_TIME >= dateadd(month,-1,current_timestamp())
AND BYTES_SCANNED > 0
GROUP BY 1
ORDER BY 5
;

Managing spilling

For performance-critical workloads, you can choose larger warehouse sizes to ensure that the intermediate data fits in memory, or at least in disk, and does not spill to object storage (e.g., S3 on AWS).

Additionally, Snowpark-optimized warehouses (which can help unlock ML training and memory-intensive analytics use cases) provide 16x more memory and 10x more local SSD cache per node compared to standard warehouses. The larger memory speeds up computations while larger local storage provides speedup when cached intermediate results and artifacts such as Python packages and JARs are reused on subsequent runs.

The following query lists queries & warehouses that can benefit from increased size or by changing warehouse type to snowpark-optimized.

SELECT QUERY_ID
,USER_NAME
,WAREHOUSE_NAME
,WAREHOUSE_SIZE
,BYTES_SCANNED
,BYTES_SPILLED_TO_REMOTE_STORAGE
,BYTES_SPILLED_TO_REMOTE_STORAGE / BYTES_SCANNED AS SPILLING_READ_RATIO
FROM "SNOWFLAKE"."ACCOUNT_USAGE"."QUERY_HISTORY"
WHERE BYTES_SPILLED_TO_REMOTE_STORAGE > BYTES_SCANNED * 5 - Each byte read was spilled 5x on average
ORDER BY SPILLING_READ_RATIO DESC
;

Metrics exposed under the QUERY_HISTORY view such as BYTES_SPILLED_TO_LOCAL_STORAGE and BYTES_SPILLED_TO_REMOTE_STORAGE indicate the extent of memory pressure, which in many cases, can be addressed in a cost-efficient manner by moving to Snowpark-optimized warehouses of the same size.

The following query identifies the top 10 worst offending queries in terms of bytes spilled to local and remote storage.

SELECT query_id, SUBSTR(query_text, 1, 50) partial_query_text, user_name, warehouse_name,
bytes_spilled_to_local_storage, bytes_spilled_to_remote_storage
FROM snowflake.account_usage.query_history
WHERE (bytes_spilled_to_local_storage > 0
OR bytes_spilled_to_remote_storage > 0 )
AND start_time::date > dateadd('days', -45, current_date)
ORDER BY bytes_spilled_to_remote_storage, bytes_spilled_to_local_storage DESC
LIMIT 10;

Conclusion and recap

Finding the sweet spot for auto-suspend settings and managing data spilling for a given workload has the potential to unlock meaningful cost optimizations. An understanding of the internals of Snowflake’s sophisticated and highly differentiated scale-to-zero and caching capabilities can aid with this advanced cost optimization exercise.

--

--