An overview of compute primitives in Snowflake and best practices to right-size them

Snowflake’s Data Cloud provided as Software-as-a-Service (SaaS), enables data storage, processing, analytic solutions, Machine Learning, and running apps & services in a performant, easy-to-use, and flexible manner. Snowflake’s pioneering architecture that disaggregates compute from persistent storage enables independent scaling of both. It also enables Snowflake to provide multiple compute primitives that are specialized for the use case. This post covers Snowflake’s compute primitives which provide elastic, highly available, and fully managed mechanisms to run a variety of customer workloads on multiple cloud service providers. It also details best practices for selecting & sizing compute primitives for optimal price-performance based on the use case.

Compute primitives in Snowflake

Virtual Warehouses are the primary compute primitive in Snowflake and are Massively Parallel Processing (MPP) compute clusters composed of multiple Virtual Machines (VMs) provisioned by Snowflake and run software managed by Snowflake. Virtual Warehouses provide elastic scaling, an instant-on experience, scale up & out, and automatically suspend & resume nearly instantly. VMs that comprise a warehouse cluster are interconnected to share data during job execution; compute clusters in a virtual warehouse do not share resources, which results in strong performance and resource isolation.

Disaggregation of compute and persistent storage enables independent scaling of both

Virtual Warehouses consume billable compute as Snowflake credits on a per-second granularity as they execute jobs, load data, and perform other DML operations. A customer’s spend is a function of the size of the virtual warehouse and how long it runs. Virtual warehouses are only billed while they are started; when suspended, they are not billed.

In addition, Snowflake provides Snowpark Container Services (in private preview) to run containerized jobs and services within Snowflake’s security/governance boundaries. To run containerized jobs and services, resources are provided by “compute pools” which are a collection of Snowflake-managed VMs that provide container hosts with the software needed to run containers (OS, container runtime, orchestration agents, etc.). For each compute pool, customers can specify the type (standard, high-memory, GPU), size, and number (via min/max counts) of VMs, specifications for individual services, and number of instances of a particular service. Snowflake handles the auto-scaling of compute pool VMs (based on stipulated node counts), ensures that specified services run with a specified number of copies, as well as load balancing of incoming traffic across services & their copies. Like virtual warehouses, compute pools are billed on a per-second basis for the duration they run depending on the number, size, and type of VMs. Snowflake also provides an OCIv2 compliant image registry for storing container images in your Snowflake account. In the shared responsibility model for Snowpark Container Services, customers are responsible for keeping their containers patched and up-to-date while Snowflake ensures the container hosts are kept patched and up-to-date on an ongoing basis.

Cloud Services in Snowflake includes a collection of highly available services that tie together the different components of Snowflake and provide functionality for user authentication, query compilation, request caching, and miscellaneous control plane functions. Usage for cloud services is charged only if the daily consumption of cloud services exceeds 10% of the daily usage of virtual warehouses.

The Cloud Services layer includes a number of control plane functions

Furthermore, Snowflake provides serverless compute services that remove the need for capacity management. This includes serverless tasks and Snowpipe. Additional serverless services include Replication, Automatic Clustering, Materialized Views, Search Optimization Service, etc.

Best practices to select & size Snowflake compute

The best practices in this post index heavily towards virtual warehouses which are the primary compute primitive and typically responsible for most of an account’s spend.

1/ Use virtual warehouse size and cluster counts to balance the demands of the latency & throughput needs of the workload

There are two warehouse types (Standard and high-memory, branded as Snowpark-optimized) and multiple T-shirt sizes (XS, S, M, L, XL, … 6XL). The warehouse type determines memory-to-CPU ratio of the virtual warehouse, while the size determines the total amount of CPU and memory available. You can modify virtual warehouse size and type, even while the cluster is currently executing queries.

You’ll need to translate the criteria you reason about such as workload as a collection of queries, desired completion time for the workload (a.k.a Service Level Objective), and cost budget into appropriate knob settings for warehouse size, cluster counts, scaling policy, etc.

While warehouse size scales a warehouse “up” and increases single-query performance, using multiple clusters in the same warehouse helps with query concurrency. Each cluster is scheduled independently and can be spun up/ down dynamically based on load.

With multi-cluster warehouses, Snowflake supports allocating, either statically or dynamically, additional clusters to make a larger pool of compute resources available to the same virtual warehouse to increase job concurrency without managing multiple distinct virtual warehouses. To use this capability, you can specify a minimum and maximum number of clusters for a given virtual warehouse.

When different values for maximum and minimum are specified, Snowflake automatically starts and stops additional clusters as needed to react to dynamic incoming load. As the number of concurrent user sessions and/or jobs increases, and jobs start to queue due to insufficient resources, additional clusters are automatically started up to the specified maximum. Similarly, if load decreases, Snowflake automatically shuts down clusters.

When the same value is specified for both the maximum and minimum number of clusters, Snowflake statically allocates those clusters to the virtual warehouse. This configuration is effective when the incoming load does not fluctuate significantly and cluster startup costs are not acceptable.

For example, in the case of interleaved workloads, when a warehouse is scaled up by a size, queries can potentially run in half the time — it is the same amount of work, but run on twice the compute capacity. Similarly, when warehouses are scaled out by the addition of a cluster, each workload is run on its own warehouse, thereby halving the time taken to complete jobs.

Virtual Warehouse controls such as size and cluster provide controls to balance latency, throughput, and cost.

Although the amount of work is the same, using scale-up or scale-out, total run-time is significantly reduced. Since you pay per-VM on a per-second basis, the cost is the same.

The following query lists warehouses and times that could benefit from multi-cluster warehouses.

- LIST OF WAREHOUSES AND DAYS WHERE MCW COULD HAVE HELPED
SELECT TO_DATE(START_TIME) as DATE
,WAREHOUSE_NAME
,SUM(AVG_RUNNING) AS SUM_RUNNING
,SUM(AVG_QUEUED_LOAD) AS SUM_QUEUED
FROM "SNOWFLAKE"."ACCOUNT_USAGE"."WAREHOUSE_LOAD_HISTORY"
WHERE TO_DATE(START_TIME) >= DATEADD(month,-1,CURRENT_TIMESTAMP())
GROUP BY 1,2
HAVING SUM(AVG_QUEUED_LOAD) >0
;

2/ Use Warehouse load and Warehouse utilization to inform sizing of virtual warehouse capacity

Metrics exposed by Snowflake such as warehouse load metrics and warehouse utilization metrics (in private preview) can inform and guide optimizations to “right size” compute capacity.

Warehouse job load metrics measure the average number of jobs that were running or queued within a specific interval. It is computed by dividing the execution time (in seconds) of all jobs in an interval by the total time (in seconds) for the interval. A job load chart shows current and historical usage patterns along with total job load in intervals of 5 minutes to 1 hour (depending on the range selected) and individual load for each job status that occurred within the interval (Running, Queued).

Warehouse utilization metrics show the utilization of resources as a percentage at a per-cluster and per-warehouse levels.

Utilization data for each cluster of the virtual warehouse as a percentage (private preview) can help identify idle capacity and inform rightsizing decisions, when used with warehouse load metrics.

Warehouse load and utilization metrics can be used in conjunction to inform capacity allocation decisions. Some high-level postulates include:

If your workload has adequate throughput and/or latency performance AND (queued query load is low OR total query load <1 for prolonged periods) AND utilization is low (e.g., less than 50%):

  • Consider downsizing the warehouse or reducing the number of clusters. Additionally, consider starting a separate warehouse and moving queued jobs to that warehouse

If your workload is running slower than desired (based on throughput and/or latency measurements) AND running query load is low AND utilization is high (e.g., greater than 75%):

  • Consider upsizing the warehouse or adding clusters

If there are recurring usage spikes (based on warehouse utilization history):

  • Consider moving queries that represent the spikes to a new warehouse or adding clusters. Also, consider running the remaining workload on a smaller warehouse

If a workload has considerably higher than normal load:

  • Investigate which jobs are contributing to the higher load.

If the warehouse runs during recurring time periods, but the total job load is < 1 for substantial periods of time:

  • Consider decreasing the warehouse size and/or reduce the number of clusters.

3/ Use bytes scanned to inform virtual warehouse sizing

The amount of data scanned is one of multiple signals that can help inform warehouse sizing based on the guidance below (for standard warehouses):

Rough guide showing directional guidance on sizing standard warehouses based on volume of data scanned

The following queries show the bytes scanned by a warehouse that can help the sizing of virtual warehouses:

SELECT WAREHOUSE_NAME
,SUM(BYTES_SCANNED) AS BYTES_SCANNED
FROM "SNOWFLAKE"."ACCOUNT_USAGE"."QUERY_HISTORY"
WHERE START_TIME >= dateadd(month,-1,current_timestamp())
;

4/ Use appropriate auto-scaling policy when using Multi-cluster Warehouses based on tradeoffs around cost and responsiveness

To help control the usage footprint of auto-scaling multi-cluster warehouses, scaling policy options (economy, standard) can be used to control the relative rate at which clusters automatically start or shut down.

The Standard policy is the default and minimizes queuing by favoring starting additional clusters over conserving credits. With the standard policy, the first cluster starts immediately when either a query is queued or the system detects that there’s one more query than the currently running clusters can execute. Each successive cluster waits to start 20 seconds after the prior one has started. For example, if your warehouse is configured with 10 max clusters, it can take ~200 seconds to start all 10 clusters. Clusters shut down after 2 to 3 consecutive successful checks (performed at 1-minute intervals), which determine whether the load on the least-loaded cluster could be redistributed to the other clusters without spinning up the cluster again.

The Economy policy conserves credits by favoring keeping running clusters fully loaded rather than starting additional clusters, which may result in queries being queued and taking longer to complete. With the economy policy, a new cluster is started only if the system estimates there’s enough query load to keep the cluster busy for at least 6 minutes. Clusters shut down after 5 to 6 consecutive successful checks (performed at 1 minute intervals), which determine whether the load on the least-loaded cluster could be redistributed to the other clusters without spinning up the cluster again.

4/ Use memory spilling to inform the use of Snowpark-optimized warehouses

The warehouse type determines memory-to-CPU ratio of the virtual warehouse, while the size determines the total amount of CPU and memory available. The Snowpark-optimized warehouses type (which can help unlock ML training and memory-intensive analytics use cases) provides 16x more memory and 10x more local SSD cache per VM compared to standard warehouses. The larger memory speeds up computations while larger local storage provides speedup when cached intermediate results and artifacts such as Python packages and JARs are reused on subsequent runs.

When execution artifacts in Snowflake perform write operations of intermediate data, first, main memory on the virtual warehouse VMs are used; if this memory is full, data is “spilled” to local disk/SSD on virtual warehouse VMs; when this local disk is also full, data spills to remote persistent storage (object storage such as Amazon S3). This scheme removes the need to handle out-of-memory or out-of-disk errors.

For performance-critical workloads, you can choose larger warehouse sizes to ensure that the intermediate data fits in memory, or at least in disk, and does not spill to object storage (e.g., S3 on AWS).

The following query lists queries & warehouses that can benefit from increased size or by changing warehouse type to snowpark-optimized.

SELECT QUERY_ID
,USER_NAME
,WAREHOUSE_NAME
,WAREHOUSE_SIZE
,BYTES_SCANNED
,BYTES_SPILLED_TO_REMOTE_STORAGE
,BYTES_SPILLED_TO_REMOTE_STORAGE / BYTES_SCANNED AS SPILLING_READ_RATIO
FROM "SNOWFLAKE"."ACCOUNT_USAGE"."QUERY_HISTORY"
WHERE BYTES_SPILLED_TO_REMOTE_STORAGE > BYTES_SCANNED * 5 - Each byte read was spilled 5x on average
ORDER BY SPILLING_READ_RATIO DESC
;

Metrics exposed under the QUERY_HISTORY view such as BYTES_SPILLED_TO_LOCAL_STORAGE and BYTES_SPILLED_TO_REMOTE_STORAGE indicate the extent of memory pressure, which in many cases, can be addressed in a cost-efficient manner by moving to Snowpark-optimized warehouses of the same size.

The following query identifies the top 10 worst offending queries in terms of bytes spilled to local and remote storage.

SELECT query_id, SUBSTR(query_text, 1, 50) partial_query_text, user_name, warehouse_name,
bytes_spilled_to_local_storage, bytes_spilled_to_remote_storage
FROM snowflake.account_usage.query_history
WHERE (bytes_spilled_to_local_storage > 0
OR bytes_spilled_to_remote_storage > 0 )
AND start_time::date > dateadd('days', -45, current_date)
ORDER BY bytes_spilled_to_remote_storage, bytes_spilled_to_local_storage DESC
LIMIT 10;

5/ Use auto-suspend and auto-resume with appropriate timeouts to avoid paying for idle time

A warehouse can be set to automatically resume or suspend, based on activity to only consume resources based on actual usage. By default, Snowflake automatically suspends the warehouse if it is inactive for a specified period of time. Also, by default, Snowflake automatically resumes the warehouse when any job arrives at the warehouse. Auto-suspend and auto-resume behaviors apply to the entire warehouse and not to the individual clusters in the warehouse.

The following query identifies all warehouses that do not have auto-suspend enabled. Enabling auto-suspend ensures that warehouses suspend after a specific amount of inactive time in order to prevent runaway costs.

SHOW WAREHOUSES
;
SELECT "name" AS WAREHOUSE_NAME
,"size" AS WAREHOUSE_SIZE
FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
WHERE IFNULL("auto_suspend",0) = 0
;

The following query identifies all warehouses that do not have auto-resume enabled. Enabling auto-resume automatically resumes a warehouse when queries are submitted against it.

SHOW WAREHOUSES
;
SELECT "name" AS WAREHOUSE_NAME
,"size" AS WAREHOUSE_SIZE
FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
WHERE "auto_resume" = 'false'
;

We recommend setting the auto-suspend duration based on the ability of the workload to take advantage of warehouse caches. This exercise involves finding the sweet spot for cost efficiency by balancing the benefits of compute cost savings by suspending compute quickly and the price-performance benefits of Snowflake’s sophisticated caching. In general, our high-level directional guidance is as follows:

· For tasks, loading, and ETL/ELT use cases, immediate suspension of virtual warehouses is likely to be the optimal choice.

· For BI and SELECT use cases, query warehouses are likely to be cost-optimal with ~10 minutes for suspension to keep data caches warm for end users.

· For DevOps, DataOps, and Data Science use cases, warehouses are usually cost-optimal at ~5 minutes for suspension as a warm cache is not as important for ad-hoc & highly unique queries.

The following query flags potential optimization opportunities by identifying virtual warehouses with an egregiously long duration for automatic suspension after a period of no activity on that warehouse.

SHOW WAREHOUSES
;
SELECT "name" AS WAREHOUSE_NAME
,"size" AS WAREHOUSE_SIZE
FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
WHERE "auto_suspend" >= 3600 // 3600 seconds = 1 hour
;

The following statement timeouts provide additional controls around how long a query is able to run before canceling it. This can help ensure that any queries that get hung up for extended periods of time will not cause excessive consumption of credits and also prevent the virtual warehouse from being suspended. This parameter is set at the account level by default and can also be optionally set at both the warehouse and user levels.

SHOW PARAMETERS LIKE 'STATEMENT_TIMEOUT_IN_SECONDS' IN ACCOUNT;
SHOW PARAMETERS LIKE 'STATEMENT_TIMEOUT_IN_SECONDS' IN WAREHOUSE <warehouse-name>;
SHOW PARAMETERS LIKE 'STATEMENT_TIMEOUT_IN_SECONDS' IN USER <username>;

The following query aggregates the percentage of data scanned from the ephemeral storage layer (cache) across all queries broken out by warehouse. Warehouses running querying/reporting workloads with a low percentage of data returned from caches indicate optimization opportunities (because warehouses may be suspending too quickly).

SELECT WAREHOUSE_NAME
,COUNT(*) AS QUERY_COUNT
,SUM(BYTES_SCANNED) AS BYTES_SCANNED
,SUM(BYTES_SCANNED*PERCENTAGE_SCANNED_FROM_CACHE) AS BYTES_SCANNED_FROM_CACHE
,SUM(BYTES_SCANNED*PERCENTAGE_SCANNED_FROM_CACHE) / SUM(BYTES_SCANNED) AS PERCENT_SCANNED_FROM_CACHE
FROM "SNOWFLAKE"."ACCOUNT_USAGE"."QUERY_HISTORY"
WHERE START_TIME >= dateadd(month,-1,current_timestamp())
AND BYTES_SCANNED > 0
GROUP BY 1
ORDER BY 5
;

6/ Use Resource Monitors to control costs and avoid unexpected usage

Resource Monitors provide alerting and hard limits to prevent overspend via credit quotas for individual virtual warehouses during a specific time interval or date range. When warehouses reach a limit a notification and/or suspension can be initiated. Resource monitors can also be set up on a schedule to track and control credit usage by virtual warehouses.

To help control costs and avoid unexpected usage, we recommend using resource monitors, which consist of limits for a specified frequency interval and a set of warehouses to monitor. When limits are reached and/or are approaching, the resource monitor can trigger various actions, such as sending alert notifications and/or suspending warehouses.

The following query identifies all warehouses without resource monitors which have a greater risk of unintentionally consuming more credits than typically expected.

SHOW WAREHOUSES
;
SELECT "name" AS WAREHOUSE_NAME
,"size" AS WAREHOUSE_SIZE
FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
WHERE "resource_monitor" = 'null'
;

7/ Use Query Acceleration Service for on-demand bursts from short-running queries

Query Acceleration Service (QAS) enables another form of vertical scaling to accelerate query performance. It automatically adds VMs outside inter-networked VMs in the warehouse cluster on an on-demand basis to offload & accelerate SQL queries. QAS is particularly useful for large table scans.

Additionally, when Snowflake detects a massive query that will scan gigabytes of data, the use of query acceleration can free up resources on the virtual warehouse cluster to execute other short-running queries from other users and is usually less expensive than scaling up to a larger warehouse and leads to more efficient use of resources. Effectively, the Query Acceleration Service acts like a powerful additional cluster that is temporarily available to deploy alongside existing virtual warehouses and when needed.

QAS compute resources billed per second. The SYSTEM$ESTIMATE_QUERY_ACCELERATION function and QUERY_ACCELERATION_ELIGIBLE View help identify queries that might benefit from QAS.

The following query identifies the queries that might benefit the most from the service by the amount of query execution time that is eligible for acceleration:

SELECT query_id, eligible_query_acceleration_time
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_ACCELERATION_ELIGIBLE
ORDER BY eligible_query_acceleration_time DESC;
The following query identifies the queries that might benefit the most from the service in a specific warehouse mywh:
SELECT query_id, eligible_query_acceleration_time
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_ACCELERATION_ELIGIBLE
WHERE warehouse_name = 'mywh'
ORDER BY eligible_query_acceleration_time DESC;

The following query identifies the warehouses with the most queries eligible in a given period of time for the query acceleration service:

SELECT warehouse_name, COUNT(query_id) AS num_eligible_queries
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_ACCELERATION_ELIGIBLE
WHERE start_time > 'Mon, 29 May 2023 00:00:00'::timestamp
AND end_time < 'Tue, 30 May 2023 00:00:00'::timestamp
GROUP BY warehouse_name
ORDER BY num_eligible_queries DESC;

The following query identifies the warehouses with the most eligible time for the query acceleration service:

SELECT warehouse_name, SUM(eligible_query_acceleration_time) AS total_eligible_time
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_ACCELERATION_ELIGIBLE
GROUP BY warehouse_name
ORDER BY total_eligible_time DESC;

Query Acceleration scale factor sets an upper bound on the amount of compute resources a warehouse can lease for acceleration. The scale factor is a multiplier based on warehouse size and cost. For example, if the scale factor is 5 for a Medium-sized warehouse (4 credits/hour), the warehouse can lease compute up to 5 times its size (i.e., 4 x 5 = 20 credits/hour). By default, the scale factor is set to 8 when Query Acceleration Service is used.

The following query identifies the upper limit for scale factor for a given warehouse with QAS enabled:

SELECT MAX(upper_limit_scale_factor)
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_ACCELERATION_ELIGIBLE
WHERE warehouse_name = 'mywh';

The following query identifies the distribution of scale factors for a given warehouse with QAS enabled:

SELECT upper_limit_scale_factor, COUNT(upper_limit_scale_factor)
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_ACCELERATION_ELIGIBLE
WHERE warehouse_name = '<warehouse_name>'
GROUP BY 1 ORDER BY 1;

8/ Use Serverless tasks to sidestep capacity management when running stable workloads that require schedule adherence

A task in Snowflake can execute any one of the following types of SQL code: 1/ Single SQL statement, 2/ Call to a stored procedure, 3/ Procedural logic using Snowflake Scripting. Tasks can be combined with table streams for continuous ELT workflows to process recently changed table rows. Tasks can also be used independently to generate periodic reports by inserting or merging rows into a report table or performing other periodic work.

Unlike user-managed tasks where you manage compute capacity for individual tasks by specifying an existing virtual warehouse (that is sized appropriately for the SQL actions that are executed by the task), with Serverless tasks, compute capacity is managed by Snowflake as required for each workload, based on statistics for previous runs of the same task. Also, Multiple workloads in your account share resources. Serverless tasks are billed as a function of the actual resources used.

Serverless tasks are recommended when you cannot fully utilize a warehouse because too few tasks run concurrently or they run to completion quickly (< 1 minute). Since the size of resources chosen is based on the history of previous runs, tasks with relatively stable runs are good candidates for serverless tasks. Additionally, serverless tasks are recommended when schedule adherence is important.

User-managed tasks are recommended when you can fully utilize a single warehouse by scheduling multiple concurrent tasks to take advantage of available compute resources. User-managed tasks are recommended for spiky or unpredictable loads on compute resources; multi-cluster warehouses with auto-suspend and auto-resume enabled could help moderate your credit consumption.

9/ Adjust the size of the Snowpark sandbox based on the use case

Unlike SQL which comes with a limited language surface, code in Java/Python/Scala for User Defined Functions (UDF) and Stored procedures presents a larger security attack surface. Since this code is run on the same virtual warehouse VMs that execute the rest of the job for performance reasons, in addition to the use of VMs (that aren’t reused across accounts) for multi-tenant compute isolation, a secure sandbox using Linux kernel primitives such as cgroups, namespaces, secomp, eBPF, chroot prevents code from accessing information outside of its environment (scoped to the current job) or affecting the operation of other parts of Snowflake; other virtual warehouse resources such as its network and namespaces are also isolated from the contents of the sandbox. Each Java/Python/Scala job gets a new sandbox and includes “just enough” read-only software needed to run code while a chroot directory tree is used for a few required writable directories and a temporary one that provides scratch space. The sandbox is run in a cgroup to limit the executing code’s memory, CPU, and PID usage. With Snowpark-optimized warehouses, spawning threads is supported to enable multi-processor use cases.

Stored procedures on Snowflake run on only a single node in a virtual warehouse and aren’t parallelized to run across multiple nodes of a virtual warehouse cluster. We recommend taking advantage of this insight to inform warehouse sizing. When using stored procedures, we recommend using single node warehouse sizes (especially for use cases such as perform single-node ML training). This can be achieved by setting WAREHOUSE_SIZE = MEDIUM and MAX_CONCURRENCY_LEVEL = 1 which ensures that the Snowpark-optimized warehouse consists of a single Snowpark-optimized node with the Snowpark sandbox configured to have the maximum memory (~240GiB) and CPU possible via appropriate cgroup changes behind the scenes.

We recommend using multi-cluster snowpark-optimized warehouses to support multiple concurrently running stored procedures. Additionally, we recommend using a separate warehouse for executing nested queries from the stored procedure; the session.use_warehouse() API can be used to select the warehouse for the query inside the stored procedure.

Unlike stored procedures, with UDFs, Snowflake attempts to use the full power of your warehouse by parallelizing computation. As a result, for UDFs, we recommend using warehouses with multiple nodes (such as a Snowpark-optimized warehouse of size L or larger).

To the extent possible, we also recommend against using LIMIT clauses or a heavily skewed GROUP BY, PARTITION BY, or JOIN in your query — these hinder Snowflake’s ability to parallelize UDFs. Instead, we recommend using the batch API when using Snowpark Python libraries such as xgboost or PyTorch to operate efficiently on batches of rows.

10/ Use Snowpark Container Services to run containers on compute infrastructure of your choice.

If you need to run OCI containers, need control over the VM SKU (CPU/RAM/GPU settings), need GPUs for acceleration, need to use custom libraries/pre-compiled binaries/unsupported Snowpark languages, need long-running services, we recommend using Snowpark Container Services (currently in private preview).

Functionality to store and retrieve vector data efficiently is useful for a number of other ML/AI applications such as Retrieval-Augmented Generation (RAG) for LLMs, recommendation Systems , computer vision, etc. are well suited for by GPUs. This is because GPUs can dramatically accelerate the mathematical operations that power such operations compared to CPUs. Additionally, the rise of LLMs have placed a focus on vector indices which are computationally intensive to build and tradeoffs between latency and recall of indices. In such applications, a large number of vector distance computations are performed to minimize the distance of vectors to their assigned clusters. This can become a bottleneck at scale as the number of vectors or vector dimensions increases. The most common indexing techniques such as IVF (InVerted File index) and PQ (Product Quantization) which divide vectors into clusters and use the KMeans algorithm to find cluster centroids can benifit from GPUs since KMeans scales quadratically.

In Snowpark Container Services, Snowflake automatically selects VMs based on the stipulated resource requirements (such as memory, GPU). Ensure you select VM capacities based on the needs of the services you plan to run when multiple service instances are run on a compute pool to achieve specific placement behaviors. For example, if each VM in a compute pool provides 16 GiB of memory and two instances of a service that each require 9 GiB are run, each instance of the service is scheduled on a separate compute pool VM since both instances don’t concurrently fit on a single VM; however, if individual VMs with 32 GiB are used, both service instance are likely to be run on a single VM. Snowflake also launches the specified minimum number of compute pool VMs and automatically adds additional VMs up to the maximum allowed when the running VMs do not have the capacity to support additional workloads.

When a compute pool is suspended, all services are suspended but the jobs run to completion; after this, compute pool VMs are released. A suspended compute pool can be either automatically or manually resumed. Unlike virtual warehouses, compute pool VMs do not have a local disk that is used to realize an ephemeral storage layer for caching intermediate results and persistent files. As a result, decisions to suspend do not have to account for the benefits of caching in a manner similar to virtual warehouses; it only needs to account for the startup lag of suspended pools.

Conclusion

Snowflake’s pioneering architecture that disaggregates compute from persistent storage makes it easy to support multiple compute primitives that are specialized for each use case. We recommend experimenting with the best practices applicable to your workloads to ensure that compute capacity is right-sized and well-utilized, thereby helping you achieve superior price-performance.

--

--