Caching the Snowflake Way: A Performance Game-Changer

TheExpertLogix
7 min readOct 18, 2023

--

Introduction

Are you tired of waiting endlessly for your query results to load, especially when you’re accessing the same data over and over again?

Is your data warehouse performance taking a hit?

Curious about the buzz surrounding Snowflake’s caching magic but unsure about what it entails?

If so, you’re in the right place!

In this blog, we’ll explore the world of caching — storing frequently used data in a ready-to-access state — and show you how caching strategies can swiftly increase the speed of accessing data.

Join us on this journey to enhance your query performance and uncover the hidden potential of Snowflake’s caching mechanisms that makes data retrieval faster and more efficient!

Caching in Snowflake

When it comes to optimizing data query performance, caching plays a pivotal role within the Snowflake ecosystem.

Caching is all about storing frequently accessed data in a readily available state, reducing the need to repeatedly fetch information from the underlying storage.

This simple approach dramatically enhances query response times and overall system efficiency.

Snowflake’s unique caching technique makes data retrieval faster and more efficient by strategically placing caches across different layers of its architecture.

Let’s explore the various types of cache and their connections to different layers within Snowflake.

Snowflake Caching Architecture

Let’s break it down. Snowflake has three primary types of caches:

  • Metadata Cache (Cloud Layer): Holds critical information about your data’s structure and statistics.
  • Query Result Cache (Cloud Layer): Stores results from previous queries, ready for instant retrieval when you rerun the same or similar queries.
  • Local Disk Cache/Virtual Warehouse Cache (Compute Layer): Resides at the compute layer, holding temporary data blocks to speed up query processing.

Additionally, while not strictly a cache, the Remote Disk resides in the Storage Layer. It’s responsible for long-term centralized storage, ensuring data resilience even in the event of data center failures.

Exploring Metadata Cache

The Metadata Cache in Snowflake retains metadata related to all objects within a Snowflake account, preserving crucial insights about your data.

It includes the following essential details:

  • Table Statistics: Quick access to row counts, min-max values, number of distinct values, and number of NULL values in each column.

E.g., retrieve row count quickly.

  • Table Versions: Keeps track of different table versions, helping to manage changes and historical data.

E.g., track schema changes.

  • File References: Holds references to physical files where the actual data resides, allowing for efficient data retrieval.

E.g., determine which files to read for a query.

Certain queries can be entirely satisfied by the metadata itself. These queries don’t require the use of a virtual warehouse. In fact, you don’t even need to enable or resume a virtual warehouse for below types of queries:

  • All “SHOW” commands (e.g., SHOW TABLES)
  • “DESCRIBE TABLE <table_name>”
  • Counting, Min, Max, and More: These aggregate functions leverage metadata to quickly compute results from micro-partitions, enhancing query performance.

Metadata Cache Expiration

By default, the Metadata Cache is always active and doesn’t require manual configuration. There is no explicit option to enable or disable it.

It remains available as long as your Snowflake account is active.

Metadata Cache In Action

The screenshot below shows the ‘SHOW TABLES’ command using metadata to quickly display all tables in the snowflake account.

In another screenshot, a query using aggregate functions allowed for fast results due to metadata utilization.

Exploring Query Result Cache

The Query Result Cache in Snowflake stores the results of previously executed queries.

This allows for instant retrieval when the same or similar queries are run again, which can significantly improve query performance.

The Query Result Cache is located in the cloud layer and is shared across all users and accounts.

When a query is executed, the Snowflake engine first checks the Query Result Cache to see if the results are already stored.

If they are, the engine simply returns the cached results without having to re-execute the query.

Query Result Cache Reuse Conditions

  • The query should match the syntax of a previously executed query.
  • The underlying data must remain unchanged since the last execution of the query.
  • The query should not contain any functions that need to be evaluated during execution , such as CURRENT_TIMESTAMP(), CURRENT_DATE()), user-defined functions (UDFs) or external functions.
  • Ensure that the Query Result Cache has not expired
  • Result Cache should not be disabled.

Query Result Cache Expiration

By default, the results of a query are cached for a period of 24 hours.

After 24 hours, the cached results are purged from the cache. This is to ensure that the cache does not become too large and that the most recent results are always available.

How to Disable the Snowflake Result Cache

If, for any reason, you wish to disable the Query Result Cache for a specific session, you can do so with the following SQL command:

ALTER SESSION SET USE_CACHED_RESULT = FALSE;

Querying the Snowflake Result Cache

The results of the Snowflake Result Cache can be queried using the RESULT_SCAN() function.

This function takes the ID of the last query as a parameter and returns a table containing the results of the query.

SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()));

Query Result Caching In Action

Imagine we want to view customers’ email addresses:

  1. Initial Query Execution: We executed the initial query to retrieve the necessary data.
  2. Query Profile View: We examined the query profile for performance analysis.
  3. Suspend the Warehouse: The warehouse was suspended.
  4. Rerunning the Query: We reran the query, and the warehouse remained suspended.
  5. Query Profile After Rerun: Upon review, the query efficiently retrieved the entire result from the Query Result Cache without resuming the warehouse.

This indicates that the Query Result Cache is a cloud-layer feature, independent of the warehouse being turned on.

Exploring Local Disk Cache/Virtual Warehouse Cache

Local Disk Cache, often referred to as the Virtual Warehouse Cache, stores temporary data blocks that can be accessed quickly by queries running on the same virtual warehouse.

This cache is unique to each virtual warehouse, boosting its own performance without affecting the others.

When your virtual warehouse grabs data from a table, it stores that data locally, this stored data can be reused in subsequent queries instead of fetching it again from the remote table in the cloud storage.

The Local Disk Cache, residing at the Compute Layer of Snowflake’s architecture, it’s distinct from the Query Result Cache, focusing on holding temporary data blocks rather than query results.

Local Disk Cache/Virtual Warehouse Cache Expiration

The Local Disk Cache is cleared when the virtual warehouse is suspended and the data stored in the cache is cleared.

This ensures that the cache remains up-to-date and does not contain stale data.

How to Disable the Local Disk Cache/Virtual Warehouse Cache

There is no way to disable the Local Disk Cache/Virtual Warehouse Cache directly.

However, you can disable it indirectly by suspending the virtual warehouse.

ALTER WAREHOUSE <warehouse_name> SUSPEND;

Local Disk Cache/Virtual Warehouse Cache In Action:

  1. Suspend the WH: We begin by suspending the warehouse for a fresh start.
  2. Disable Query Result Cache: To emphasize the Local Disk Cache’s role, we disable the Query Result Cache.
  3. Run the Query: We execute a query to analyze customer details within the birth year range of 1980 to 1990.
  4. Query Profile View: In the query profile, we observe that there’s no cache involvement, with 0% scanned from the cache. The majority of data retrieval relies on Remote Disk I/O (68.9%).
  5. Run the Modified Query: Next, we modify the query filter to analyze customer details within the birth year range of 1985 to 1995.
  6. Query Profile View After Running Modified Query: In the query profile for the modified query, we notice a significant shift. The Percentage Scanned from Cache increases to 93.76%, indicating that most of the data is retrieved from the cache. This leads to substantial performance improvement and also reduces the reliance on Remote Disk I/O (14.6%) while increasing the utilization of Local Disk I/O (1.4%).

Maximizing Snowflake Cache Performance: Best Practices

  • Understanding Data Usage Patterns: Analyze how your data is accessed to determine what should be cached for optimal performance.
  • Using Metadata Cache Smartly: Use the metadata cache smartly for rapid access to critical data information, optimizing queries and functions.
  • Caching Reusable Query Results: Enable query result caching for recurrent or similar queries to expedite response times.
  • Virtual Warehouse Cache: Utilize local disk cache (virtual warehouse cache) to accelerate query performance within a warehouse.
  • Wisely Suspend and Resume Warehouses: Optimize performance by making informed decisions when suspending and resuming warehouses.
  • Scaling Responsibly for Cache Efficiency: Adjust the size and resources of your virtual warehouse thoughtfully, considering cache efficiency. Scaling up during peak usage optimizes cache effectiveness and performance, while scaling down during off-peak hours reduces costs while managing cache usage.
  • Experiment and Evolve: Continuously experiment and adapt caching strategies as system usage and requirements evolve.

For more insights on scaling strategies, refer to our previous blog:

Snowflake’s Scaling Strategies: Scale Up vs. Scale Out

Conclusion

Snowflake’s Caching magic increases the query performance and efficiency.
By strategically storing frequently accessed data in caches like the Metadata Cache, Query Result Cache, and Local Disk Cache, Snowflake makes data retrieval faster and more efficient.

By capitalizing on the right combination of these techniques, you can get the most out of Snowflake caching capabilities and maximize your performance.

With the right caching strategy and performance tuning practices, you can gain better control of Snowflake’s performance.

--

--

TheExpertLogix

Thought Leaders and domain experts in Cloud and Data Engineering