From Compression to Query Pruning: Exploring Snowflake Micro-Partitions

4 min readJul 6, 2024

A micro-partition is a file stored in the Snowflake storage service.

Micro-partitions use a proprietary, closed -source file format created by Snowflake.

Micro-partition contain a header enclosing metadata describing the data stored, with the actual data grouped by column and stored in a compressed format.

Groups of data stored in tables are mapped to individual micro-partitions, organized in columnar fashion. Which means the data is compressed by organizing it into columns rather than by row for each micro-partition.

This allows Snowflake to scan data for a column value without touching the rest of the data in a given row. This size and structure allows for both optimization and efficiency in query processing. Micro-partitioning is automatically performed on all Snowflake tables.

A single micro-partition can contain up to 16 megabytes of compressed data, that uncompressed is typically between 50 and 500 megabytes. There is no limit to the number of micro-partitions a table can have.

Compressed and Uncompressed data storage in Snowflake

Small tables less than 500 megabytes of uncompressed data may only have a single micro-partition, and for large tables Snowflake can create thousands or even millions of micro-partitions. For every new arrival of data chunks, it does not modify the old micro-partitions because they are all immutable by nature, and hence it adds more and more micro-partitions as new data arrives.

Unlike tables in traditional data warehouses or databases, that typically have a relatively small number of partitions, a table in Snowflake can easily have millions of partitions.

In the micro-partition header, the byte ranges for each column within the micro-partition are stored, allowing Snowflake to retrieve only the columns relevant to a query using a byte range get. This is why queries will run faster if you reduce the number of columns selected.

In simpler words, if a query specifies a filter predicate on a range of values that accesses 30% of the values in the range, it should ideally only scan the portion of the micro-partitions that contain the data, in this case, 30% of the values.

Micro partitions are derived automatically as data is ingested in Snowflake. They don’t need to be explicitly defined upfront or maintained by users.

Metadata about each column for the micro partition is stored by Snowflake in its metadata cache in the cloud services layer.

Column metadata stored in Cloud services layer

This metadata is used to provide extremely fast results for basic analytical queries. Micro-partitions allow users to perform extremely efficient DML and fine -grained pruning on very large tables, which can be composed of millions or even hundreds of millions of micro-partitions.

Efficient DML and fine -grained pruning on very large tables

The min and max values crucially provide Snowflake with the ability to perform pruning.

From Compression to Query Pruning: Exploring Snowflake Micro-Partitions

Written by Saisuman Singamsetty