Snowflake Supports Interactive Analytics at Scale

Facet Engineering

Published in

Facet Engineering

6 min readJul 28, 2021

by Andrew Fisher

With a proliferation of massively parallel processing (MPP) database technologies, like Apache Pinot, Apache Druid, and ClickHouse, there are no shortage of blog posts on the Internet explaining how these technologies are the only ones capable of supporting interactive analytics on large data volumes. That is not the case.

At Facet, we performed benchmark tests on Snowflake’s platform with wide, denormalized datasets and concurrent query access patterns. As a result of these tests, we know that Snowflake offers reasonably fast query performance on large datasets when queried in an iterative, ad-hoc fashion.

Our test

We wanted to understand how Snowflake performs on datasets as the monthly event volume increases. Four datasets in total were used with row count ranging from 100 million rows up to 1.2 billion rows. The same test was exercised over each dataset three times — each time with a different Snowflake warehouse size, from medium to x-large. All tests had five concurrent users executing “slice and dice” type queries; by that we mean “aggregate over time” and “topN” queries where one can see a key business metric then hone in on dimensions affecting that metric. We chose five concurrent users as the Facet product is built for data analytics teams that answer to business.

Our findings

Our tests show that Snowflake provides response times suitable for exploring high volume, high dimensionality, time series data in an interactive fashion with five concurrent users running “slice and dice” type queries. The figure below shows the median and 99th percentile response times for each test we conducted.

As the dataset size increases, the query response times increase.
As more compute power is applied, the query response times improve.
The median response time for all tests we ran is 3 or fewer seconds (with the exception of the 1.2B row dataset on the medium warehouse).
In some cases, queries did return in under one second.
The 99th percentile is used extensively in performance testing to provide a measure of worst case performance. In over half the tests we ran, the worst case was approximately five seconds or better.
No queueing of individual queries or warehouse cluster scaling was observed during the tests, so there is room for additional concurrent usage.

Figure 1: Results of 12 benchmark tests conducted on one month of the Star Schema Benchmark dataset where the number of table rows was 100 million, 300 million, 600 million, and 1.2 billion and the Snowflake warehouse size varied with each test.

Configuring Snowflake for optimal query results was actually quite easy. We wanted to avoid scanning entire tables in Snowflake for a better user experience at a lower cost, so we clustered the tables by date. Clustering is available out of the box with Snowflake and happens automatically once the table is created. For these tests, a timestamp is used for a filter in every query and clustering provided a nice benefit. There is still room for more improvements, by adding in one to three more of the most commonly used columns in the dataset to enable further optimized table scans. Snowflake also offers Search Optimization Service, which was recently released. This may be another method to achieve optimal performance. Besides clustering and search optimization service, rolling up the data (i.e., aggregating all metric columns grouped by all dimension columns), may reduce your dataset size and provide improved performance.

In real-world applications, you might even observe better performance for queries that are frequently run due to Snowflake’s result cache in their service layer and the local disk cache in their compute layer. Our tests minimized for the effect of caching; the small number of cached queries exercised in our tests were removed from the results. For more information, refer to Caching in Snowflake Data Warehouse.

Test methodology

Our benchmark tests used the Star Schema Benchmark dataset, which has been in use for over a decade to assess data warehouse performance. We created four datasets consisting of different row counts, from 100 million rows to 1.2 billion rows. The datasets were generated with an appropriate scale factor to reach the target data volume over a one-month time period and loaded into Snowflake. In Snowflake, the line order, customer, supplier, and part tables were denormalized into one flattened table. Interactive analytics is typically done on de-normalized, wide datasets with all facts and dimensions residing in one table.

After loading the data, we simulated up to five users simultaneously asking “aggregate over time” and “topN” queries over a one-week timeframe for 12 minutes. To minimize the effects of caching by Snowflake and that we used wide datasets, we randomly varied the columns selected by each query along with the time ranges filtered on. This behavior mimicked here is the common user behavior of the Facet application, where you look at a few metrics and a few dimensions, add some filters, watch the data update, add more new filters, update the data, and repeat until you have your answer.

Here is an example of the “aggregate over time” query:

This is an example “topN” query:

We ran 12 tests in total — three for each dataset with a different Snowflake warehouse size (medium, large, and x-large). Snowflake has smaller and larger warehouses. We recommend starting with a smaller warehouse and increasing the warehouse size based on your data volume, expected usage, and budget.

The query response times presented are direct from Snowflake and exclude a small number of cached queries for each run.

Additional findings

With the troves of data made available through digital transformation efforts, a dataset with 1.2 billion rows in a given month is not that large. We took Snowflake for a test drive with two other massive, denormalized datasets over a one-month timeframe. These two datasets are “rolled up,” meaning metrics are aggregated in hourly time slices when all dimension columns are exactly the same. The roll-up is important to note because a 50–100x reduction in the number of rows is not uncommon when going from log-level (raw) data to hourly aggregated data.

Similar tests as described above were run but with up to 20 concurrent users to assess worst-case performance. The table below shows summary statistics about the two datasets and the median and 99th percentile response times we saw during our benchmarks:

Table 1: Results of 2 benchmark tests conducted on one month of two massive, denormalized datasets using a 4X-Large Snowflake warehouse.

What’s next?

Based on these results, we are very excited about Snowflake and how well it can work for this use case especially with such an extremely low barrier to entry and ease of customization (of which we did very minimal). We were quite surprised to see that the median response time was only ~2s for running aggregations on large datasets, showing that we can deliver a seamless user experience for exploratory analytics on Snowflake, even at scale.

We conducted similar benchmarks on Google BigQuery. We are also evaluating the new and exciting technology of Apache Pinot, and have seen some exciting early results. We’re excited to share the results of the BigQuery and Pinot evaluations. Stay tuned.

In the meantime, check us out at www.facetdata.com to see a demo of our exploratory analytics tool.

‍