What is Delta Lake?
Delta Lake is essentially the data layer of the Lake House pattern. It brings support for ACID transactions to data lakes, scalable metadata handling, schema evolution, data versioning,updates and deletes, for example. All the data is stored in the Apache Parquet format and users can enforce schemas (and change them with relative ease if necessary).
Getting Started:
I am going to share Examples for Delta Lake.
Add following Library dependency ,
//For SBT
"io.delta" %% "delta-core" % "0.4.0
//For Maven
<dependency>
<groupId>io.delta</groupId>
<artifactId>delta-core_2.12</artifactId>
<version>0.4.0</version>
</dependency>
Create table in Delta Lake
Read table from Delta Lake
Creating and Reading table from Delta lake
I have used sales_test.csv file for my demo. Creating table and reading table using above method.
Updating table data in Delta Lake
Either you can append or overwrite existing table data using SaveMode.
How to use updateDeltaTable method
Adding new column in existing Table
How to use add column Method
Delta Lake has more functionality like data versioning(Time), conditional update, conditional delete, upserts and deletes. In Upcoming series , I will explain those topics with examples.
Second Series of Delta lake : Time travel
Thanks for reading!!!!
See you soon :)