Ollion’s Notes from Snowflake Summit Part 4: Embracing Open Standards With Iceberg

Greg Marsh
4 min readJun 11, 2024

--

The recent Snowflake Summit shed light on a significant development in data management and storage, with the GA release of Iceberg tables. This feature allows Snowflake users to store data externally while leveraging Snowflake’s robust performance and feature set. It is particularly beneficial for reducing storage costs and providing the flexibility to use various technologies for data processing and analysis without necessitating data migration.

In many ways, this release represents Snowflake recognizing the reality that customers (1) want to leverage multiple compute engines depending on use cases and (2) desire zero lock-in when it comes to their data ecosystem!

This service can lessen the “gravity” that Snowflake maintains on many customers, but it also opens huge opportunities for Snowflake to win in the workloads that they are best at!

All pictures are from Snowflake

Integration of Apache Iceberg

Snowflake has integrated Apache Iceberg tables as a native format within its platform. This move is complemented by the introduction of Polaris, a specialized catalog for Iceberg tables. Polaris enhances data processing by facilitating joins across different tables and simplifying the retrieval of previously hard-to-access information. Additionally, it manages permissions to ensure secure data access.

Polaris can be accessed by any data processing engine compatible with the Iceberg format, such as Spark, Dremio, and Snowflake itself. This flexibility underscores Snowflake’s commitment to providing open standards and avoiding vendor lock-in, a critical concern for many organizations.

Competitive Landscape and Industry Trends

Snowflake’s competitor, Databricks, recently acquired Tabular, a provider of Iceberg distribution. This acquisition highlights the growing industry shift towards open-source solutions for data lakes and lakehouses. Apache Iceberg’s rising popularity signifies a significant move towards open standards, allowing organizations to use a single storage layer across multiple compute engines and frameworks, including Snowflake, Spark, Trino, and Flink.

By adopting Iceberg, Snowflake allows customers to avoid the proprietary “Flocon De Neige” (FDN) file format, offering flexibility and reducing costs associated with data storage and redundancy. While Snowflake may relinquish some storage-related revenue, it positions itself to capture a larger share of the analytics compute market, including AI and data applications.

Snowflake and Open Standards

Snowflake’s adoption of Iceberg aligns with the broader industry demand for open design, cost efficiency, and interoperability. By enabling multiple tools to work with data seamlessly, Snowflake meets these needs head-on.

The Parquet and Delta Direct features further exemplify Snowflake’s commitment to open standards. These features allow users to onboard Snowflake and Iceberg tables directly into their data lakes. Parquet Direct treats a folder location as a table with schema inference, while Delta Direct uses a delta lake transaction log for schema and file information. This approach ensures cost savings by maintaining a single copy of the data and enhances interoperability with platforms like Databricks and Fabric.

Future Developments and Use Cases

Looking ahead, Snowflake is developing zero-copy shares from Fabric, promising further enhancements in data sharing and collaboration. Iceberg tables present numerous use cases, from zero-ingest scenarios where data remains in place to building lakehouses on open standards.

A particularly compelling use case is storing raw data in open-source formats while keeping curated “Silver and Gold” data layers close to the compute resources that utilize them (ie Snowflake). This strategy maximizes the benefits of open standards while leveraging Snowflake’s powerful analytics capabilities.

In conclusion, Snowflake’s embrace of Iceberg tables and open standards marks a significant shift in data management. By providing flexibility, reducing costs, and ensuring interoperability, Snowflake is well-positioned to meet the evolving needs of modern data-driven organizations.

This has been Part 4 of my series on the Snowflake Summit. Check out Part 3 read about everything showcased at the Builder Keynote address.

Keep reading on one of my favorite data management services on Snowflake, Dynamic Tables!

About Ollion

At Ollion, we have been a proud Snowflake Service Partner for almost a decade. Our mission is to connect companies and capabilities worldwide, helping ambitious organizations achieve game-changing breakthroughs without losing sight of the people impacted. We offer a unique point of view as an independent, straightforward partner backed by a global team of client partners, sales, engineering, delivery, and more.

Let me know if you attended and want to talk more about Snowflake Summit 2024!

--

--

Greg Marsh

MBA from Georgetown University; Principal at Ollion (formally Aptitive/2nd Watch), a global analytics consulting firm.