Snowflake Snowpro Core Preparation Part 12 — Query Acceleration,Search Optimization & Caching

Ganapathy Subramanian.N
7 min readJun 4, 2024

--

In this blog, we are going to discuss about Snowflake’s Query Acceleration ,Search Optimization & Caching.

Query Acceleration Service:

● The query acceleration service (QAS) can accelerate parts of the query workload in a warehouse.

● When it is enabled for a warehouse, it can improve overall warehouse performance by reducing the impact of outlier queries, which are queries that use more resources than the typical query.

● The query acceleration service does this by offloading portions of the query processing work to shared compute resources that are provided by the service.

● Types of workloads that might benefit from QAS by performing more work in parallel and reducing the time spent in scanning and filtering :

> Ad hoc analytics.

> Workloads with unpredictable data volume per query.

> Queries with large scans and selective filters.

● To identify the queries or warehouses that might benefit from the query acceleration service, you can query the QUERY_ACCELERATION_ELIGIBLE View. Also use the SYSTEM$ESTIMATE_QUERY_ACCELERATION function to assess whether a specific query is eligible for acceleration.

Eligibility Criteria:

> Large scans with an aggregation or selective filter.

> Large scans that insert many new rows.

● Snowflake doesn’t have a specific cutoff for what constitutes a “large enough” scan to be eligible.

● Snowflake only marks a query as eligible if there is high confidence that the query would be accelerated if QAS was enabled.

Reasons for ineligibility:

> Not enough partitions in the scan.

> Even if a query has a filter, the filters may not be selective enough.

> The query includes a LIMIT clause but does not have an ORDER BY clause.

> The query includes functions that return nondeterministic results (for example, SEQ or RANDOM).

SYSTEM$ESTIMATE_QUERY_ACCELERATION:

● The SYSTEM$ESTIMATE_QUERY_ACCELERATION function can help determine if a previously executed query might benefit from the query acceleration service.

QUERY_ACCELERATION_ELIGIBLE View:

● Query the QUERY_ACCELERATION_ELIGIBLE View to identify the queries and warehouses that might benefit the most from the query acceleration service. For each query, the view includes the amount of query execution time that is eligible for the query acceleration service.

Queries that might benefit the most from the service
Warehouses with the most eligible time
Queries that might benefit the most from the service in a specific warehouse

Supported SQL Commands By QAS:

> SELECT

> INSERT

> CREATE TABLE AS SELECT (CTAS)

Enabling QAS:

QAS in Query Profile:

Search Optimization Service:

● The search optimization service can significantly improve the performance of certain types of lookup and analytical queries.

● Search optimization service creates and maintains a persistent data structure called a search access path.

● The search access path keeps track of which values of the table’s columns might be found in each of its micro-partitions, allowing some micro-partitions to be skipped when scanning the table.

● The search optimization service aims to significantly improve the performance of certain types of queries on tables including:

> Selective point lookup queries on tables. A point lookup query returns only one or a small number of distinct rows.

> Substring and regular expression searches (e.g., [ NOT ] LIKE, [ NOT ] ILIKE, [ NOT ] RLIKE, etc.).

> Queries on fields in VARIANT, OBJECT, and ARRAY (semi-structured) columns that use the following types of predicates:

Equality predicates.

IN predicates.

Predicates that use ARRAY_CONTAINS.

Predicates that use ARRAYS_OVERLAP.

Substring and regular expression predicates.

Predicates that check for NULL values.

> Queries that use selected geospatial functions with GEOGRAPHY values.

● Once you identify the queries that can benefit from the search optimization service, you can enable search optimization for the columns and tables used in those queries.

● Search access path maintenance is transparent.

● There is a cost for the storage and compute resources of maintenance.

Enabling Search Optimization Service:

Performance Optimization Comparison:

All the details about QAS & Search Optimization are available in this Youtube Video:

Caching:

● Caching is the process of storing the results of a request at a different location than the original or a temporary storage location so that we can avoid redoing the same operations.

● Cache is temporary storage for files, and it provides faster to access this data from the cache.

● Snowflake checks to see whether that query has been previously run and, if so, whether the results are still cached.

● Snowflake will use the cached result set if it is still available rather than executing the query you just submitted.

Query Results Cache:

● The fastest way to retrieve data from Snowflake is by using the query result cache.

● The results of a Snowflake query are cached, or persisted, for 24 hours and then purged.

● Even though the result cache only persists for 24 hours, the clock is reset each time the query is re executed, up to a maximum of 31 days from the date and time when the query was first executed.

● After 31 days, or sooner if the underlying data changes, a new result is generated and cached when the query is submitted again.

● The result cache is fully managed by the Snowflake’s cloud services layer.

● Query results returned to one user are also available to any user who has the necessary access privileges and who executes the same query.

● Any user can run a query against the result cache with no running virtual warehouse needed, assuming the query is cached, and the underlying data has not changed.

● Only cache that can be disabled by a parameter -

● Disabling the result cache is necessary before performing A/B testing, and it is important to enable query result caching once the testing is complete.

Metadata Cache:

● Snowflake collects and manages metadata about tables, micro partitions, and even clustering.

● Snowflake’s metadata cache stores the following information:

> For tables, Snowflake stores row count, table size in bytes, file references, and table versions.

> The range of values in terms of MIN and MAX.

> NULL count.

> Number of distinct values.

> Total number of micro partitions.

> Depth of overlapping micro partitions.

● The metadata cache is fully managed by the Snowflake’s cloud services layer.

● The information stored in the metadata cache is used to build the query execution plan.

● All these below queries uses metadata cache:

Warehouse Cache:

● The traditional Snowflake data cache is specific to the virtual warehouse used to process the query.

● Running virtual warehouses use SSD storage to store the micro partitions that are pulled from the centralized database storage layer when a query is processed.

● The size of the virtual warehouse’s SSD cache is determined by the size of the virtual warehouse’s compute resources.

● Whenever a virtual warehouse receives a query to execute, that warehouse will scan the SSD cache first before accessing the Snowflake remote disk storage.

● Reading from SSD is faster than from the database storage layer but still requires the use of a running virtual warehouse.

● Referred to as the raw data cache, the SSD cache, or the data cache.

This cache is dropped once the virtual warehouse is suspended.

Finding data scanned from cache

All the details about Snowflake’s Caching are available in this Youtube Video:

Photo Courtesy:

https://docs.snowflake.com

Book — Snowflake — The definitive guide

Reference Links:

Query Acceleration:

https://docs.snowflake.com/en/user-guide/query-acceleration-service

Search Optimization:

https://docs.snowflake.com/en/user-guide/search-optimization-service

Blog — Caching in snowflake:

https://medium.com/@ExpertLogix/caching-the-snowflake-way-a-performance-game-changer-eade0cddab93

Catch you in the next blog — Snowflake Snowpro Core Preparation Part 13 — Data Loading & Copy Into, Snowpipe,Data Unloading & Copy Into — https://medium.com/@trustngs/snowflake-snowpro-core-preparation-part-13-data-loading-copy-into-snowpipe-data-unloading-15535a976f3c

--

--

Ganapathy Subramanian.N

Director - Data Engineering @ Tiger Analytics. 100+ Technical Certifications. AWS-CB, Cloud(Architect,Devops,Data Engg,DW&ML),YouTuber & Blogger