Iceberg Tables in Snowflake

Wheat field seen on Seaton Hiking Trial, March 30 2024

Iceberg Tables in Snowflake are in public preview since first announced at Snowflake Summit 2022. In this post we’re looking at some details when creating iceberg tables on S3 in Snowflake.

1 Iceberg open table format

To address Big data challenges a meta data layer is needed to abstract/hide information of physical data files providing an easy use, quick query, reliable storage access interface for applications and users. Table formats are designed at this layer to provide relational database like access via SQL. For example Hive tables which is built on top of Hadoop. (see how Iceberg arises in this Youtube link.)

Iceberg table is similar to Hive while it’s built on top of parquet file format targeting Cloud storages (AWS, Azure, GCP). Differences between Iceberg table and Hive table are described in this blog.

Iceberg table uses catalog to keep table metadata that points to underneath metadata structure and further data files. (similar to Hive metastore).

Link: https://iceberg.apache.org/spec/#overview
  • metadata file: Catalog points to current metadata file which usually is a JSON file on storage. It includes info about table schema, path to manifest list file, data snapshots at different times etc. A new current metadata file is created whenever there are data changes. This metadata file points to a manifest list file.
  • manifest-list file: It includes path to all manifest files which then point to all data files of this iceberg table. a new manifest-list file is created whenever there are data changes.
  • manifest files: It includes paths to data files and properties like column’s lower value and upper value etc. New arrival data files cause new manifest file to be created.

Querying data in Iceberg table starts from catalog, ends with returning data from data files. Metadata layer can

  • narrow down query by filtering query conditions ( for example, where month=’March’) to only search relevant data files (so called query pushdown).
  • redirect query to specific data snapshot for time-travel
  • be modified independently without touching data files in case of schema evolution.

More benefits can be found in Iceberg table doc.

2 Iceberg table in Snowflake

Specifically, Snowflake supports accessing Parquet files on S3 via a Iceberg table created in Snowflake data cloud. In this case, Snowflake client code/Snowsight SQL retrieve metadata info of the data files from catalog which is hosted in Snowflake (or outside Snowflake).

Following Snowflake SQL creates Iceberg table object in Snowflake and catalog in Snowflake as well. Full step by step guide can be found in this quickstart.

-- Create external volume pointing to S3 which will be 
-- used by Iceberg table for metadata and data storage
CREATE OR REPLACE EXTERNAL VOLUME exvol_s3
STORAGE_LOCATIONS =
(
(
NAME = 'my-s3-2-us-east-1'
STORAGE_PROVIDER = 'S3'
STORAGE_BASE_URL = 's3://feng-public-bucket/parquet'
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::xxxx:role/snowflake_role'
)
);

-- Create Iceberg table object in Snowflake pointing to external S3
CREATE OR REPLACE ICEBERG TABLE iris_iceberg (
id INTEGER,
c_width float,
c_length float,
p_width float,
p_length float,
species string
)
CATALOG='SNOWFLAKE'
EXTERNAL_VOLUME='exvol_s3'
BASE_LOCATION='';

Note, I have one iris.parquet file in the S3 location already but above SQL does NOT create this iceberg table loading data from the file! I’m getting an empty table in Snowflake. This is different from creating external table which is more convenient.

So next, we have to manually LOAD this table using COPY INTO command for example.

Create or replace file format parquet_ff type=parquet;

copy into iris_iceberg
from (select $1:Id,
$1:PetalWidthCm::float,
$1:PetalLengthCm::float,
$1:SepalWidthCm::float,
$1:PetalWidthCm::float,
$1:Species::string
from @aws_s3_stage/iris.parquet (file_format => 'parquet_ff'));

We can query table data in Snowflake like other Snowflake tables — the real data is actually on S3, Not in Snowflake storage.

3 Data file structure on S3

Take a look at Icegerg table metadata and data files on S3…we know there should be metadata file, manifest list file, manifest file and data file.

We see a directory “iris_ice_1027” is created by creating Iceberg table SQL.

iris_ice_1027 is the directory for iceberg table metadata and data files

Two subdirectories metadata and data…

Two subdirectories metadata and data

3.1 Metadata files under metadata subdirectory

1> metadata file: v1712146270382000000.metadata.json

JSON files are metadata file pointed from Snowflake catalog. It uses UNIX epoch time in names indicating which is the current metadata file of this Iceberg table. Catalog points to the current/latest one.

New metadata file will be created using latest timestamp in name whenever there are data changes.

{
"format-version" : 2,
"table-uuid" : "a2431341-e310-437d-ae91-a0ec93f6c0c8",
"location" : "s3://feng-public-bucket/parquet/iris_ice_1027",
"last-sequence-number" : 1,
"last-updated-ms" : 1712146270382,
"last-column-id" : 6,
"current-schema-id" : 0,
"schemas" : [ {
"type" : "struct",
"schema-id" : 0,
"fields" : [ {
"id" : 1,
"name" : "ID",
"required" : false,
"type" : "int"
}, {
"id" : 2,
"name" : "C_WIDTH",
...
"id" : 6,
"name" : "SPECIES",
"required" : false,
"type" : "string"
} ]
} ],
"default-spec-id" : 0,
"partition-specs" : [ {
"spec-id" : 0,
"fields" : [ ]
} ],
"last-partition-id" : 999,
"default-sort-order-id" : 0,
"sort-orders" : [ {
"order-id" : 0,
"fields" : [ ]
} ],
"properties" : {
"format-version" : "2"
},
"current-snapshot-id" : 1712146270382000000,
"snapshots" : [ {
"sequence-number" : 1,
"snapshot-id" : 1712146270382000000,
"timestamp-ms" : 1712146270382,
"manifest-list" : "s3://feng-public-bucket/parquet/iris_ice_1027/metadata/snap-1712146270382000000.avro",
"schema-id" : 0,
"summary" : {
"manifests-replaced" : "0",
"added-data-files" : "1",
"added-records" : "150",
"total-files-size" : "2560",
"operation" : "append",
"manifests-kept" : "0",
"added-files-size" : "2560",
"total-records" : "150",
"manifests-created" : "1",
"total-data-files" : "1"
}
} ]
}

2> manifest-list file: snap-1712146270382000000.avro

It’s avro file but we can read file headers…manifest-list file includes paths to all manifest files in this Iceberg table.

...
{"type":"record","name":"manifest_file","fields":[
{"name":"manifest_path","type":"string","doc":"Location URI with FS scheme","field-id":500},
{"name":"manifest_length","type":"long","doc":"Total file size in bytes","field-id":501},
{"name":"partition_spec_id","type":"int","doc":"Spec ID used to write",...

3> manifest file: 1712146270382000000-Ojptbydcru4GivfPQqRsKQ.avro

Manifest-list file points to multiple manifest files where each data file location can be located.

...
{"type":"record","name":"manifest_entry","fields":[{"name":"status","type":"int","field-id":0},{"name":"snapshot_id","type":["null","long"],"default":null,"field-id":1},{"name":"sequence_number","type":["null","long"],"default":null,"field-id":3},{"name":"file_sequence_number","type":["null","long"],"default":null,"field-id":4},{"name":"data_file","type":{"type":"record","name":"r2","fields":[{"name":"content","type":"int","doc":"Contents of the file: 0=data, 1=position deletes, 2=equality deletes","field-id":134},{"name":"file_path","type":"string","doc":"Location URI with FS scheme","field-id":100},{"name":"file_format","type":"string","doc":"File format name: avro, orc, or parquet","field-id":101},{"name":"partition","type":{"type":"record","name":"r102","fields":[]},"doc":"Partition data tuple, schema based on the partition spec","field-id":102},{"name":"record_count","type":"long","doc":"Number of records in the file","field-id":103},{"name":"file_size_in_bytes","type":"long","doc":"Total file size in bytes","field-id":104},{"name":"column_sizes","type":["null",{"type":"array","items":{"type":"record","name":"k117_v118","fields":[{"name":"key","type":"int","field-id":117},{"name":"value","type":"long","field-id":118}]},"logicalType":"map"}],"doc":"Map of column id to total size on disk","default":null,"field-id":108},{"name":"value_counts","type":["null",{"type":"array","items":{"type":"record","name":"k119_v120","fields":[{"name":"key","type":"int","field-id":119},{"name":"value","type":"long","field-id":120}]},"logicalType":"map"}],"doc":"Map of column id to total count, including null and NaN","default":null,"field-id":109},{"name":"null_value_counts","type":["null",{"type":"array","items":{"type":"record","name":"k121_v122","fields":[{"name":"key","type":"int","field-id":121},{"name":"value","type":"long","field-id":122}]},"logicalType":"map"}],"doc":"Map of column id to null value count","default":null,"field-id":110},{"name":"nan_value_counts","type":["null",{"type":"array","items":{"type":"record","name":"k138_v139","fields":[{"name":"key","type":"int","field-id":138},{"name":"value","type":"long","field-id":139}]},"logicalType":"map"}],"doc":"Map of column id to number of NaN values in the column","default":null,"field-id":137},{"name":"lower_bounds","type":["null",{"type":"array","items":{"type":"record","name":"k126_v127","fields":[{"name":"key","type":"int","field-id":126},{"name":"value","type":"bytes","field-id":127}]},"logicalType":"map"}],"doc":"Map of column id to lower bound","default":null,"field-id":125},{"name":"upper_bounds","type":["null",{"type":"array","items":{"type":"record","name":"k129_v130","fields":[{"name":"key","type":"int","field-id":129},{"name":"value","type":"bytes","field-id":130}]},"logicalType":"map"}],"doc":"Map of column id to upper bound","default":null,"field-id":128},{"name":"key_metadata","type":["null","bytes"],"doc":"Encryption key metadata blob","default":null,"field-id":131},{"name":"split_offsets","type":["null",{"type":"array","items":"long","element-id":133}],"doc":"Splittable offsets","default":null,"field-id":132},{"name":"equality_ids","type":["null",{"type":"array","items":"int","element-id":136}],"doc":"Equality comparison field IDs","default":null,"field-id":135},{"name":"sort_order_id","type":["null","int"],"doc":"Sort order ID","default":null,"field-id":140}]},"field-id":2}]}avro.codecdeflateformat-version2"partition-spec-id0iceberg.schemaÊ+{"type":"struct","schema-id":0,"fields":[{"id":0,"name":"status","required":true,"type":"int"},{"id":1,"name":"snapshot_id","required":false,"type":"long"},{"id":3,"name":"sequence_number","required":false,"type":"long"},{"id":4,"name":"file_sequence_number","required":false,"type":"long"},
{"id":2,"name":"data_file","required":true,"type":{"type":"struct","fields":[{"id":134,"name":"content","required":true,"type":"int","doc":"Contents of the file: 0=data, 1=position deletes, 2=equality deletes"},
{"id":100,"name":"file_path","required":true,"type":"string","doc":"Location URI with FS scheme"},{"id":101,"name":"file_format","required":true,"type":"string","doc":"File format name: avro, orc, or parquet"},{"id":102,"name":"partition","required":true,"type":{"type":"struct","fields":[]},"doc":"Partition data tuple, schema based on the partition spec"},{"id":103,"name":"record_count","required":true,"type":"long","doc":"Number of records in the file"},{"id":104,"name":"file_size_in_bytes","required":true,"type":"long","doc":"Total file size in bytes"},{"id":108,"name":"column_sizes","required":false,"type":{"type":"map","key-id":117,"key":"int","value-id":118,"value":"long","value-required":true},"doc":"Map of column id to total size on disk"},{"id":109,"name":"value_counts","required":false,"type":{"type":"map","key-id":119,"key":"int","value-id":120,"value":"long","value-required":true},"doc":"Map of column id to total count, including null and NaN"},{"id":110,"name":"null_value_counts","required":false,"type":{"type":"map","key-id":121,"key":"int","value-id":122,"value":"long","value-required":true},"doc":"Map of column id to null value count"},{"id":137,"name":"nan_value_counts","required":false,"type":{"type":"map","key-id":138,"key":"int","value-id":139,"value":"long","value-required":true},"doc":"Map of column id to number of NaN values in the column"},
{"id":125,"name":"lower_bounds","required":false,"type":{"type":"map","key-id":126,"key":"int","value-id":127,"value":"binary","value-required":true},"doc":"Map of column id to lower bound"},
{"id":128,"name":"upper_bounds","required":false,"type":{"type":"map"...

When loading more data into Iceberg table we can see more metadata files, manifest-list files and manifest files are created. Snowflake catalog only points to current metadata file. But given all the snapshots are created by Snowflake time travel can easily be supported.

3.2 data directory has physical data files

The data files are referred by manifest files in metadata directory. Currently Snowflake Iceberg table only support parquet file format.

Note, manually uploaded data files to this S3 location won’t be recognized by Snowflake Iceberg table.

Happy Reading!

--

--

Feng Li
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Software Engineer, playing with Snowflake, AWS and Azure. Snowflake Data Superhero. Jogger, Hiker.