Implementation of Iceberg table on Snowflake

Rohan Modak
4 min readJan 15, 2024

--

Photo by Annie Spratt on Unsplash

This write up is all about the implementation of Iceberg table on Snowflake. There is enough material already available about the Iceberg table and hence I wont deep dive into the whereabouts of Iceberg tables.

This blog is implementation of Iceberg table on Snowflake . This feature was introduced in 2023 and is now available for public . There are some pre requisites before creating the Iceberg table in Snowflake

  1. Creation of external volume in Snowflake :- This is a account level Snowflake object that needs to be created in Snowflake for implementation of Iceberg table .

Right now, only the ACCOUNTADMIN role has access to create the External volume . The ACCOUNTADMIN has to provide the USAGE privileges to the corresponding role .

Below are the details that need to be filled out for creation of External volume. I have demonstarted below for Amazon. The paramters gets changed a bit for Google and Azure.

CREATE OR REPLACE EXTERNAL VOLUME <NAME>
STORAGE_LOCATIONS =
(
(
NAME = <REGION_NAME>
STORAGE_PROVIDER =’S3'
STORAGE_BASE_URL = <BASE_LOCATION>
STORAGE_AWS_ROLE_ARN = <This could be obtained with AWS administrator>. Usually if you have created a Storage integration on the same base location. You can directly get it from Storage integration details
)
);

Once the external volume is created , the new STORAGE_AWS_EXTERNAL_ID is created by Snowflake . This needs to updated in the Policy at the AWS end with the help of AWS administrator . The new STORAGE_AWS_EXTERNAL_ID can be obtained by giving following commands.

DESC EXTERNAL VOLUME <NAME> ;

2. Creation of catalog integration in Snowflake:- This is only required if we need to created a Unmanaged Iceberg table i.e. either managed by AWS Glue or External Storage .

Right now only the ACCOUNTADMIN has the privileges to create the catalog integration in Snowflake and once done , USAGE privileges needs to be granted to the desired role. Below is sample example of creation of catalog integration for External Storage

CREATE OR REPLACE CATALOG INTEGRATION <NAME OF OBJECT>
CATALOG_SOURCE=OBJECT_STORE
TABLE_FORMAT=ICEBERG
ENABLED=TRUE/FALSE :- “TRUE”enables whether the same integration can be used to create another ICEBERG tables too .

Once this step is completed we can proceed for creation of Iceberg table. There are two ways to create a Iceberg table

  1. Snowflake Managed Iceberg table:- This is a Iceberg table which is managed by Snowflake . This is very similar to Snowflake native table and offers all the basic functionality of native Snowflake table .We are only billed for Cloud services charges i.e both compute and Storage while using this type of table . Below is the syntax for creation of Iceberg table with Snowflake managed one

CREATE OR REPLACE ICEBERG TABLE <TABLE_NAME>(<COLUMN NAMES>)
WITH EXTERNAL_VOLUME=<NAME OF EXTERNAL VOLUME> CATALOG=’SNOWFLAKE’
BASE_LOCATION=<LOCATION ON S3 from where data is supposed to be referenced>

This will create a empty Iceberg table on Snowflake. We can use any of the Snowflake recommended approach to insert the data into this table i.e. COPY INTO or Snowpipe. We can also create a Iceberg table on top of External tables if we need to leverage the benefits of Iceberg table. This will create a table with data on Snowflake . Below is the Syntax for the same.

CREATE OR REPLACE ICEBERG TABLE <TABLE_NAME>
WITH EXTERNAL_VOLUME=<NAME OF EXTERNAL VOLUME> CATALOG=’SNOWFLAKE’
BASE_LOCATION=<LOCATION ON S3 from where data is supposed to be referenced> AS SELECT <COLUMN_NAME> FROM <EXTERNAL_TABLE> .

Since this is Snowflake managed table , it supports all the DML operations and data is reflected straightaway while accessing the table. It doesn’t not support any auto refresh capability if underlying external table is refreshed and we need to manually give commands to reflect the same in Iceberg tables.

Also, Snowflake generates the metadata & data on external location which could be found by giving following commands.

SELECT SYSTEM$GET_ICEBERG_TABLE_INFORMATION(<TABLE_NAME>);

2. Unmanaged Iceberg table:- This is a read only Iceberg table that references the data that is provided either on AWS Glue or External Object . For data on External storage the data has to be in Iceberg format for it to be able to access the same . Below is the commands that will create a Iceberg table on Snowflake which is managed by Catalog Integration

CREATE OR REPLACE ICEBERG TABLE <TABLE_NAME>
EXTERNAL_VOLUME=<NAME OF OBJECT>
CATALOG=<NAME OF CATALOG>
METADATA_FILE_PATH =<EXTERNAL STORGAE LOCATION> — This needs to be after the External volume path details . So if your metdata is at S3://data/data.json and external volume base location is S3://data then we only need to provide data.json as the METADATA_FILE_PATH.

This is a read only Iceberg table and needs to be refreshed for every metadata update. Below is the command to refresh the Iceberg table

ALTER ICEBERG TABLE <TABLE_NAME> REFRESH <METADATA_FILE_PATH>;

This is great feature provided by Snowflake and usually its great add on as compared to external table if performance is key aspect. Right now it only supports if the data is in parquet format .

Happy Learning!!!!

--

--