Teradata Partition Elimination vs. Snowflake Micro-Partitions

Roland Wenzlofsky
3 min readFeb 27, 2023

--

All database systems have as an important goal to minimize the number of IOs.

This goal can be achieved in Teradata, among other optimization techniques, by defining a good primary index or using secondary indexes. Secondary indexes always require additional storage space and cause maintenance overhead for DML statements. I deliberately do not mention join indexes here, since they are essentially a kind of materialized views.

The Teradata approach

Another way to minimize the number of IOs is in Teradata Row Partitioning. Here no own index structure is needed but the physical rows in the file system are in such a way arranged that full table scans can be prevented and only the necessary partitions must be read.

However, when creating the Teradata row partitioned tables, a specific partition key must be defined that fits as well as possible for the WHERE conditions (or joins). I.e. row partitioning is an important component when creating the physical data model.

Snowflake takes a different approach

Each Snowflake table is divided into micro-partitions. For each of these micro-partitions metadata is created which contains essential information: Minimum and maximum value per column (unlike Teradata, Snowflake is a column store and even if Teradata offers a columnar storage format, it cannot really be called a columnar database, since the SQL engine has to reconstruct rows when executing the execution plan for further processing of the data).

In Snowflake, no partition key needs to be predefined to perform partition elimination. All information needed for partition elimination is already available through the metadata for each micro-partition (minimum and maximum value per column). I.e. partition elimination can be performed automatically on each column of each micro-partition.

This is an advantage over Teradata Row Partitioning: the WHERE condition in a Teradata query must “hit” the partitions to allow partition elimination.

Let’s take as an example a table that is partitioned by a date in Teradata. Only if a WHERE condition exists on this date column, partition elimination becomes possible.

In Snowflake you don’t have to do anything. Only micro-partitions are searched for data in which the date of the WHERE condition exists.

Assume we add another column filter to the WHERE condition. Snowflake can automatically use this information to restrict the searched micro-partitions even further. Teradata can’t, as it the partition key is only the date column. To achieve the same result, the Teradata table would need to be re-partitioned with a multi-level partition key on the data column and the additional column used for filtering in the WHERE condition.

However there is a restriction in Snowflake:

While in Teradata by the definition of the partition key is guaranteed that the rows of the date selected in the WHERE condition are perfectly sorted next to each other at the file system and can be copied optimized from the mass storage (via cylinder reads) into the main memory it depends with Snowflake on how the data were loaded initially. If one adds e.g. in the daily batch load one day after the other one will achieve also in Snowflake a good clustering and thus good partition elimination. If this is not the case then the efficiency of partition elimination decreases with Snowflake.

With large Snowflake tables it makes sense to use the functionality of clustering which arranges the data in the micro-partitions according to a chosen cluster key. This happens in the background and automatically. But you have to consider that this clustering causes operational costs (credits). A large Snowflake table with cluster key should therefore be manipulated as little as possible by DML statements.

--

--

Roland Wenzlofsky

Roland Wenzlofsky is an experienced Freelance Consultant & Performance Specialist. Born in Austria's capital Vienna.