Iceberg Tables in Snowflake

Sushantpattanaik
4 min readMay 10, 2024

Iceberg tables are a new type of table in Snowflake where the actual data is stored outside of the Snowflake database. Instead, the data resides in a public cloud object storage location such as Amazon S3, Google Cloud Storage, or Azure Storage. These tables use the Apache Iceberg table format.

What Are Iceberg Tables?

How Iceberg Tables Work:

Data Storage:

  • Iceberg tables store their data and metadata files in the external cloud storage location. Snowflake does not provide fail-safe storage for Iceberg tables; you are responsible for managing the external storage, including data protection and recovery.
  • Snowflake connects to your storage location using an external volume, which is a named, account-level Snowflake object. The external volume stores an identity and access management (IAM) entity for your cloud storage. A single external volume can support one or more Iceberg tables.

Iceberg Catalog:

Cross-Cloud/Cross-Region Support:

  • Iceberg tables can span multiple cloud providers and regions.

Billing:

Creating and Using Iceberg Tables:

Create an Iceberg Table:

  • Define an Iceberg table in Snowflake, specifying the external storage location.
  • Example: SQL

CREATE TABLE my_iceberg_table USING ICEBERG LOCATION = 's3://my-bucket/my-path';

Querying Iceberg Tables:

  • Query Iceberg tables just like regular Snowflake tables.
  • Example: SQL

SELECT COUNT(*) FROM my_iceberg_table WHERE column1 = 'value';

Should Iceberg Be Used for Time Travel?

While Iceberg tables provide benefits for managing data, they do not inherently support time travel.

  • If you need time travel functionality, it’s recommended to use Snowflake’s built-in time travel features.
  • You can create regular Snowflake tables (not Iceberg tables) and take advantage of time travel for historical data queries.

What Makes Iceberg Catalogs So Special in Snowflake?

What Is an Iceberg Catalog?

Why Are Iceberg Catalogs Special?

  • External Storage Integration: Iceberg catalogs enable seamless integration between Snowflake and external storage systems.
  • Cost Efficiency: By storing data externally, you can take advantage of lower storage costs compared to Snowflake’s native storage.

Data Management Features:

Using Iceberg Catalogs in Snowflake:

  • When creating an Iceberg table in Snowflake, you have two options for the catalog:

1.Snowflake as the Iceberg Catalog:

2.External Catalog Integration:

Example: Using Snowflake as the Iceberg Catalog:

  • To query Iceberg tables using the Apache Spark engine, configure the following properties for your Spark cluster:

spark-shell --packages org.apache.iceberg:iceberg-spark-runtime-3.3_2.13:1.2.0,net.snowflake:snowflake-jdbc:3.13.28 --conf spark.sql.catalog.snowflake_catalog = org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.snowflake_catalog.catalog-impl = org.apache.iceberg.snowflake.SnowflakeCatalog --conf spark.sql.catalog.snowflake_catalog.uri ='jdbc:snowflake://<account_identifier>.snowflakecomputing.com' --conf spark.sql.catalog.snowflake_catalog.jdbc.user = <user_name> --conf spark.sql.catalog.snowflake_catalog.jdbc.password = <password> --conf spark.sql.catalog.snowflake_catalog.jdbc.private_key_file = <location of the private key>

  • After configuration, you can query available tables:

spark.sessionState.catalogManager.setCurrentCatalog("snowflake_catalog") spark.sql("SHOW NAMESPACES").show() spark.sql("SHOW TABLES").show()

The summay ,Iceberg catalogs allow Snowflake to manage external Iceberg tables effectively, combining the best of both worlds: powerful data management features and cost-efficient storage.

--

--