Delta, Hudi, Iceberg — Which is most popular?

Kyle Weller
3 min readAug 26, 2023

--

Delta Lake, Apache Hudi, and Apache Iceberg are the popular open source projects leading the way for the new Lakehouse architecture pattern. Reading about their origin stories reveals how each project got started between 2017–2019 from Uber, Netflix, and Databricks.

With growing popularity and discussion of the data lakehouse there is a rising interest to compare these three open source projects. Equally important to comparing features is to compare the communities. A community can make or break the development momentum, ecosystem adoption, or the objectiveness of the platform. Ironically one of the most popular questions I hear people ask is, which project is most popular?

To measure the community around open source projects there are a few good places to go and find publicly available information. Github, Slack, Twitter are the most popular places where OSS communities form and collaborate. There are also some great websites which compile interesting statistics: https://ossinsight.io/

So looking at Delta, Hudi, and Iceberg, all three projects are growing at a rapid pace and have strong momentum behind them. When you inspect the measurable data around the communities there are a few interesting trends.

Social Stats as of Aug 2023:

As of August 2023, Delta Lake has the most attention public attention on these social areas of communtiy. Delta Lake naturally has the most awareness, and the most people using it naturally with a multi-billion dollar company behind it, Databricks.

Hudi and Iceberg are catching up and both are showing exponential growth in their communities. On social media channels Iceberg especially is benefiting from some recency of marketing after Snowflake announced support for the project.

Github Developer Community:

When you dive into Github stats around the developer community who are contributing to the projects a different story is revealed. While Hudi and Iceberg are neck+neck on Github stars, Hudi has more contributors than both Iceberg and Delta.

Community Diversity:

The last thing to discuss is community diversity. Diversity of the community is a good indicator of grass-roots strength and propensity for the project.

Delta Lake is primarily driven by Databricks which could actually be seen as a pro or con. In contrast Hudi and Iceberg have a diversity of many large organizations that regularly contribute to the project development.

Delta:

Hudi:

Iceberg:

In summary:

When diving into an open source project, it is fun to join a community. Developers collaborate from all around the globe and share ideas. Open projects foster faster innovation and allow organizations to build together in diverse ecosystems.

#DeltaLake, #ApacheHudi, and #ApacheIceberg, all have fast growing #DataLakehouse communities behind them. Delta has the most awareness while Hudi has the most developers contributing to the project. Go find each on Github, Twitter, and Slack to engage! If you want to learn more, check out this recent video where I walk through the strengths, benchmarks, and more for each project: https://www.linkedin.com/events/deepdive-hudi-iceberg-anddeltal7095484265877950465/comments/

--

--