What is — Apache Hudi

Karim Faiz
3 min readDec 13, 2023

Apache Hudi (Hadoop Upserts Deletes and Incrementals) is a modern data lake technology that provides efficient upserts, deletes, and incremental processing capabilities. Developed by Uber and later contributed to the Apache Software Foundation, Hudi is designed to manage large-scale data storage and enable real-time data processing. Here’s a detailed overview of Apache Hudi and how to get started with it.

Key Features of Apache Hudi

Upserts and Deletes: Hudi supports upserts (updating and inserting data simultaneously) and deletes, allowing for efficient data mutation and maintenance.

Incremental Data Processing: Facilitates the processing of only new or updated data since the last batch, enhancing efficiency in data pipelines.

Fast Read and Write Operations: Optimized for speed, Hudi provides quicker data access and modification capabilities compared to traditional batch processing methods.

Snapshot Isolation: Ensures that readers can consistently query the data without being affected by ongoing writes.

Scalable Indexing: Offers scalable indexing methods to manage and query large datasets.

--

--

Karim Faiz

Data Architect / Data Engineer - Follow me to stay informed and be the first to benefit from my upcoming articles! 🌟👏 My links 🔗 : https://bio.link/karimfaiz