Slalom First Look \\\ Snowflake Materialized Views
Leverage the newest releases of Snowflake for better query performance
In the world of data warehousing, the number one issue that consistently comes up is query performance, and the performance of the views that are being queried. Many times, an analytical data model (or star schema) is made up of tables, and then views are created on top of those tables to surface specific data slices to the reporting front end. A view is just a canned query that will be executed against the base tables every time someone accesses it. However, if the query that defines the view becomes too complex, the performance of the view can suffer quite a bit. Enter: materialized views.
What is a Materialized View?
Wikipedia defines a materialized view as “a database object that contains the results of a query”. While that is not particularly helpful (a table can be loaded with the results of a query), it does provide us with a foundation. A view is simply a saved query that can be executed against base tables. A materialized view is the permanently cached version of that view, allowing for the results to be returned not from the base table, but from the cache.
Materialized View in Snowflake
In Snowflake, views have been available since the solution was launched. But, since it is an Massively Parallel Processing (MPP) solution, the base tables that are being used to create the views do not have any indexing, so joins can become very expensive very quickly. In Q1 of 2019, Snowflake introduced materialized views, which caches the view results for faster querying on the front end. Not only will it cache the results, Snowflake will automatically update the cache if the data in the underlying tables is changed, keeping the view updated as if it was a live view.
Why Do I Care?
There are a number of different problem statements that are solved with the introduction of materialized views. One key use case is if you have critical reports that rely on a view in Snowflake, the performance of that view is incredibly important, and many cycles of development are wasted constantly optimizing the query on the back end, before the decision is finally made to just turn it into an aggregated table and have jobs that constantly update the table. With a materialized view, you can cut your development time significantly, and provide consistent performance of the view.
The other key use case is with semi-structured data. Snowflake can ingest semi-structured data as is, and Snowflake will automatically read what’s in that message and allow for easy parsing of the data in that message. That allows you to easily create a structured shape off of the semi-structured data point. Many people will create views on top of semi-structured table, and then join that view on other tables to provide the analysis of the data. This can cause some performance issues since the view has to parse out the original VARIANT column that holds the semi-structured message, and can drag down the performance of the queries up-stream. If you start to see these performance issues cause your analysts some trouble, you should consider a materialized view. Rather than create a structured table on top of the semi-structured table and have to constantly update the table if it’s full of streaming data points, a materialized view will hold the cache and constantly update with the latest data from the stream.
Materialized Views are currently only available for enterprise level customers or above. The views do incur additional storage costs since the data is being cached in the storage layer, as well as some additional compute charges for the service that keeps the view current. While these costs are certainly important to consider, the performance benefits of materialized views can absolutely outweigh those costs, and allow your organization to scale your data warehouse in Snowflake more effectively, and at a much faster rate.