Hot Take — Apache Hudi, Delta Lake, Apache Iceberg are Divergent

Kyle Weller
3 min readAug 24, 2023

--

Above is a screenshot from some of the earliest commits for Apache Hudi, Apache Iceberg, and Delta Lake. I think it’s interesting to study the origin stories for these projects because they all had different roots and focus areas when starting that still show up in how the communities are driven today.

Having personally engaged for several years in multiple of these open source communities I have helped 100’s of users build planet scale data lakehouse architectures on Azure, AWS, and GCP. I compiled my research and experience into this comprehensive comparison article: https://www.onehouse.ai/blog/apache-hudi-vs-delta-lake-vs-apache-iceberg-lakehouse-feature-comparison

Let’s talk about origin stories…

Apache Hudi =

  • Hudi was open sourced from Uber in 2017
  • Hudi was built when they needed a petabyte-scale, near-real-time platform to process all their trip data, run fraud detection, and more.
  • Before the “Lakehouse” term was popular the Hudi community initially called this a “transactional data lake” ->
  • Hudi focused from the beginning on transforming Batch to incremental processing

Apache Iceberg =

  • Apache Iceberg was open sourced in 2018 from Netflix where they were struggling with S3 scale limitations.
  • They wanted to replace the hive table format with a new portable metadata layer

Delta Lake =

  • Delta was open sourced in 2019 and some of the initial goals were to replace warehouse like functionality with Spark and make Spark the gold standard data processing engine across the industry

The common narrative I hear is that most people believe these projects are so similar and its just a matter of time before each builds and covers the same feature set. In my personal opinion, I think they are actually divergent and building towards slightly different goals. Delta Lake is to further the success for Databricks. Apache Iceberg create a portable format layer with spec and catalog replacement for Hive. Apache Hudi is building comprehensive database platform services on the lake.

So while a majority of the technical fundamentals are common, I believe these three projects are working on divergent goals. If you want to hear more discussion on this topic and learn about the strengths of these projects in depth, check out this recent webinar: https://www.linkedin.com/events/deepdive-hudi-iceberg-anddeltal7095484265877950465/comments/

#apachehudi #apacheiceberg #deltalake #datalakehouse

--

--