Spark ETL Chapter 10 with Lakehouse

Kalpan Shah
Plumbers Of Data Science
10 min readApr 4, 2023

--

Spark ETL with Delta Lake, Apache Iceberg, and Apache Hudi

Previous blog/Context:

In an earlier blog, we discussed Spark ETL with Lakehouse (with Apache Iceberg). Please find below blog post for more details.

Introduction:

Today, we will discuss below points

  • Spark ETL with famous Lakehouse formats. (Delta Lake, Apache Iceberg, and Apache HUDI)
  • Offerings from all these lake house data formats.
  • Pros and Cons of all the lake house data formats.
  • Comparison of all of these lake house data formats.
  • Load data from the source system (we will be taking MySQL as our source system), Transform data and load it into all of these lake house data formats.
  • Read data from all of these lake house formats.
  • Compare all of these file formats in terms of how they create data and metadata.
  • References and learning materials

Delta Lake, Apache Iceberg, and Apache Hudi

--

--

Kalpan Shah
Plumbers Of Data Science

Senior Data Engineer | Developer | Data Enthusiast | Mentor | Amigos 😍