Deep dive on caching in Snowflake

How much relevant this topic is with respect to SnowPro Core exam?

This topic is very important, and you may expect 1–2 question from this topic in SnowPro exam.

In this blog we are going to see how Snowflake help boost performance by providing cache at a different layer of its architecture.

In Snowflake, we don't see much option for performance tuning like in any known RDBMS. Snowflake do provide few of very good performance feature through its unique Multi Cluster Shared data architecture. One of the feature is caching which is nothing new in database world as we had seen this in lot of renowned RDBMS like Oracle etc. Difference lies in the way how it was implemented in three different layers and how we can utilize same. Cache can leverage to save money and speed result. Snowflake is smart enough to validate if required data is available in cache, and it only digs deeper starting from Result Cache> Warehouse Cache and ultimately from Remote Disk if it doesn’t found any. Cache get invalidated the moment underline data has changed.

Snowflake caches data at 3 layers:

(Source: Snowflake)

1. Metadata Cache layer:

As the name suggest this layer is powered by the metadata layer of Snowflake. Snowflake stored the metadata of all objects in this layer like

  • Row Count
  • Table Size in Bytes
  • File references and table versions

It also stores metadata related to micro partition and clustering. Some of this metadata information can be leverage to the user without spinning any Virtual warehouse.

2. Results Caching Layer:

  • This layer holds the query result for the past 24 hours.
  • The Lowest latency for cached query result.
  • Query results are available across Virtual Warehouse. The Query result of one user can be reused by the other user provided both query are identical and underline data is not changed.
  • User can be different, but role must be same to take benefit of result cache.
  • An inbuilt function like current_time() can cause the result to be invalidated.

3. Virtual Warehouse SSD Cache/ Local Disk Caching layer:

  • Whenever we query in snowflake, data get retrieved from the Remote Disk storage, and cached in SSD and memory of the Virtual Warehouse.
  • Data is in raw form in this layer.
  • Data in this layer remains active until the warehouse is active.
  • Better latency but cache get cleared along with warehouse suspension.
  • If data already exists in warehouse layer it reframe to go to remote layer and fulfill user query from warehouse cache.

How to disable the cache in Snowflake if one don’t want to use the same?

Since this cache are maintained at independent layer, so there is no single source of control.

To disable Result cache you can use below:

alter session set use_cached_result = false;

To disable Warehouse or Local Disk cache either you can shut down virtual warehouse from Web UI or you can use below:

alter warehouse COMPUTE_WH suspend;

You can’t disable the metadata cache as this is part of Snowflake metadata layer.

Demo on Snowflake Caching :

Hope this blog help you to get insight on Snowflake Caching. Feel free to ask a question in the comment section if you have any doubts regarding this. Give a clap if you like the blog. Stay connected to see many more such cool stuff. Thanks for your support.

You Can Find me:

Follow me on Medium: https://rajivgupta780184.medium.com/

Follow me on Twitter: https://twitter.com/RAJIVGUPTA780

Connect with me in LinkedIn: https://www.linkedin.com/in/rajiv-gupta-618b0228/

Subscribe to my YouTube Channel: https://www.youtube.com/channel/UC8Fwkdf2d6-hnNvcrzovktg

#Keep learning #Keep Sharing #Everyday Learning.

References:-

--

--