Optimizing Snowflake Cloud Services costs

This blog post provides an overview of Snowflake Cloud Services, Cloud Services costs, and ways to optimize them.

Background on Cloud Services

Cloud Services is the layer of Snowflake’s architecture that provides the “control plane” to enable elasticity, availability, and fault tolerance on multiple cloud service providers. It provides capabilities such as user authentication, query compilation, result caching, safe code rollout and rollback, throttling of incoming requests, placement, load-balancing across availability zones, and cross-cloud & cross-region replication. Additional details on Snowflake’s Cloud Services layer can be found here.

Snowflake architecture showing Cloud Services which is the focus of this post

Cloud Services Costs

Usage of Cloud Services is charged only if the daily consumption of cloud services exceeds 10% of the daily usage of virtual warehouses. The charge is calculated daily (in the UTC time zone). This ensures that the 10% adjustment is accurately applied each day, at the credit price for that day.

If cloud services consumption is less than 10% of warehouse compute credits on a given day, the adjustment for that day is equal to the cloud services used by the account. Additionally, since the adjustment is applied daily, the total adjustment for Cloud Services on the monthly bill can be less than 10% of the cumulative warehouse spend footprint.

Additionally, it is important to note that serverless services such as query acceleration service, automatic clustering, materialized views, search optimization service, etc. do not factor into the 10% adjustment for cloud services.

Patterns to analyze & optimize

If cloud services usage is higher than expected, check for the following patterns and consider performing on the associated recommendations.

1/ Blocked queries due to transaction locks

Update and merge commands put a transaction lock on a table, which prevents other commands from executing on that table until the lock is released. Queries consume cloud services credits as they wait for a lock to be released. The longer queries stay on the cloud services layer waiting for the lock, the more credits they consume.

Transaction locks often occur when users run concurrent update/merge commands on a single table, especially when each command only updates a few rows. These locks can be minimized by using batch operations rather than single updates. This includes strategies such as a batch INSERT statement to load new data into a temporary table, avoiding single-row inserts, and using temporary tables to update/merge the destination table. If the source sends new data continuously throughout the day, consider using a task to update the destination table on a periodic basis.

2/ Copy commands with poor selectivity

Executing copy commands involves listing files from object storage (Amazon S3 or equivalent). Listing files uses only cloud services compute. As a result, executing copy commands with poor selectivity can result in high cloud services usage.

To ameliorate this, we recommend including filters such as date prefixes for object storage buckets.

3/ High-frequency DDL operations and cloning

Data Definition Language (DDL) operations, particularly cloning, are entirely metadata operations. As a result they use only cloud services compute. Frequently creating or dropping large schemas or tables, or cloning databases for backup can result in significant cloud services usage.

To ameliorate this challenge, we recommend reviewing DDL operations, especially for cloning patterns to ensure they are as granular as possible, and are not executed at excessive frequencies. For example, consider cloning only individual tables rather than an entire schema.

4/ High-frequency queries

The cloud services compute footprint of seemingly inexpensive simple queries (e.g., as SELECT 1, SELECT sequence1.NEXTVAL, SELECT CURRENT_SESSION() ), queries against INFORMATION_SCHEMA views, SHOW commands, etc. at a high frequency (e.g., hundreds of times per day) can result in significant cloud services usage.

SHOW commands as well as queries against INFORMATION_SCHEMA are entirely metadata operations and consume only cloud services resources.

We recommend using the parameterized query hash to identify such recurrent queries, review query frequency to determine whether the frequency is appropriately set for the use case. If you observe a high frequency of SELECT CURRENT_SESSION() queries from partner tools using the JDBC driver, confirm that the partner has updated their code to use the getSessionId() method in the SnowflakeConnection interface. This uses caching and can help reduce cloud services usage.

Additionally, querying views in the ACCOUNT_USAGE schema (which uses a virtual warehouse rather than cloud services) instead of INFORMATION_SCHEMA can help transfer the same costs to the warehouse layer outside of Cloud Services costs.

5/ Single row inserts and fragmented schemas outside of hybrid tables

Single row inserts outside of hybrid tables are sub-optimal from both price and performance perspectives. As a result, they can consume significant cloud services resources. To ameliorate this, we recommend batch/bulk loading rather than single row inserts or the use of Snowflake hybrid tables.

Data applications that follow the pattern of defining one schema per customer may result in several data loads in a given time period, which can result in high cloud services consumption. This pattern also results in a lot more metadata that Snowflake needs to maintain, and metadata operations consume cloud services resources. To address this challenge, we recommend using a shared schema, clustering tables on customer_id, and use secure views to be cost efficient.

6/ Inefficient SQL queries

Queries can consume significant cloud services compute if they include a lot of joins/Cartesian products, use the IN operator with large lists, or are very large queries. These types of queries also typically have high compilation times.

We recommend reviewing such using the query profile from either the UI or programmatically to identify patterns such as large scans, cartesian joins, etc. to optimize them.

--

--