Introduction to Delta Lake : Part 1

Aravinth
2 min readNov 6, 2019

--

Delta Lake is an open-source storage layer that brings ACID
transactions to Apache Spark™ and big data workloads.

What is Delta Lake?

Delta Lake is essentially the data layer of the Lake House pattern. It brings support for ACID transactions to data lakes, scalable metadata handling, schema evolution, data versioning,updates and deletes, for example. All the data is stored in the Apache Parquet format and users can enforce schemas (and change them with relative ease if necessary).

Delta Lake

Getting Started:

I am going to share Examples for Delta Lake.

Add following Library dependency ,

//For SBT
"io.delta" %% "delta-core" % "0.4.0
//For Maven
<dependency>
<groupId>io.delta</groupId>
<artifactId>delta-core_2.12</artifactId>
<version>0.4.0</version>
</dependency>

Create table in Delta Lake

Method to Create Table in Delta Lake

Read table from Delta Lake

Method to Read Table from Delta Lake

Creating and Reading table from Delta lake

I have used sales_test.csv file for my demo. Creating table and reading table using above method.

Updating table data in Delta Lake

Either you can append or overwrite existing table data using SaveMode.

How to use updateDeltaTable method

Appending and overwriting existing data in table

Adding new column in existing Table

How to use add column Method

Delta Lake has more functionality like data versioning(Time), conditional update, conditional delete, upserts and deletes. In Upcoming series , I will explain those topics with examples.

Second Series of Delta lake : Time travel

Thanks for reading!!!!

See you soon :)

--

--