What is a Delta table in Databricks?

Pubudu Dewagama
Tributary Data
Published in
5 min readFeb 13, 2023
Photo by Markus Winkler on Unsplash

Delta tables are a new type of table in Databricks that provide a powerful and efficient way to work with big data. They are optimized for fast, read-intensive, large-scale data processing and are ideal for use cases such as data lakes.

A Delta table is essentially a versioned version of a data lake table that is stored as a collection of small data files in a hierarchical file system, rather than as a single monolithic file. This allows for fast, incremental processing and enables you to keep track of changes to the data over time.

Delta tables also offer a number of other advantages over traditional data lake tables, such as:

  • ACID transactions: Delta tables support ACID transactions, which allow you to update, delete, or insert data in a safe and consistent manner, even when multiple users are accessing the same data simultaneously.
  • Data versioning: Delta tables automatically version each change to the data, so you can roll back to any previous version if necessary.
  • Data management: Delta tables provide a number of built-in data management features, such as automatic data pruning and compaction, which help you keep your data lake organized and optimized for performance.
  • Efficient queries: Delta tables are optimized for fast querying, even over large datasets. They also support predicate pushdown, which allows you to filter data at the storage layer before it is loaded into memory, making your queries faster and more efficient.

Overall, Delta tables provide a powerful and flexible way to store, manage, and process big data in Databricks, and are a great choice for data lake use cases.

In the next sections, I will discuss the key areas of the Delta Tables.

  1. Create a Database in Datalake
  2. Delta Table creating using PySpark
  3. Delta Table creating using Spark SQL
  4. Delta Table creating using Dataframe
  5. Display data of Delta Tables
  6. Describe Delta Tables
  7. Drop Delta Tables
  8. List Delta Tables
  9. Rename the columns of the Delta Tables
  10. Roll back delta table to previous version

Now we will see how we can implement each and every step:

  1. Create a Database in Datalake: In this step, you would typically define a database in your Datalake where you plan to organize and store your Delta tables. This step ensures that your tables are logically grouped within a specific database.
spark.sql("create database if not exists landing")

Method 1: Using PySpark

Using PySpark, you can create a Delta table, which is a versioned and transactional table. Delta tables enable easy management of data versions, ACID transactions, and efficient processing with features like schema evolution.

Create Delta table in Datalake

from delta.tables import *
DeltaTable.create(spark)\
.tableName("landing.student")\
.addColumn("studentId","LONG")\
.addColumn("StudentName","STRING")\
.addColumn("DeptNo","INTEGER")\
.addColumn("Salary","DECIMAL(18,4)")\
.property("description","table for student")\
.location('/mnt/landing')\
.partitionedBy("DeptNo")\
.execute()

Create Delta table If Not Exists

from delta.tables import *
DeltaTable.createIfNotExists(spark)\
.tableName("landing.studentIfnotExist")\
.addColumn("studentId","LONG")\
.addColumn("StudentName","STRING")\
.addColumn("DeptNo","INTEGER")\
.addColumn("Salary","DECIMAL(18,4)")\
.property("description","table for student")\
.location('dbfs:/mnt/landing/student')\
.partitionedBy("DeptNo")\
.execute()

Create or Replace Delta table

from delta.tables import *
DeltaTable.createOrReplace(spark)\
.tableName("landing.studentcreateorReplace")\
.addColumn("studentId","LONG")\
.addColumn("StudentName","STRING")\
.addColumn("DeptNo","INTEGER")\
.addColumn("Salary","DECIMAL(18,4)")\
.property("description","table for student")\
.location('dbfs:/mnt/landing/student')\
.partitionedBy("DeptNo")\
.execute()

Method 2: Using Spark SQL

Alternatively, you can create a Delta table using Spark SQL. This approach involves executing SQL statements to define the schema and properties of your Delta table.

Create Delta table in Datalake

%sql 
CREATE TABLE landing.student_sparksql (
studentId LONG,
StudentName STRING,
DeptNo INTEGER,
Salary DECIMAL(18, 4)
)
USING DELTA
location '/mnt/landing/sparksql'

Create or Replace Delta table

%sql 
CREATE OR REPLACE TABLE landing.student_CreateorReplace_sparksql (
studentId LONG,
StudentName STRING,
DeptNo INTEGER,
Salary DECIMAL(18, 4)
)
USING DELTA
location '/mnt/landing/student_CreateorReplace_sparksql '

Create Delta table If Not Exists

%sql 
CREATE TABLE IF NOT EXISTS landing.student_Idnot_Exist_sparksql (
studentId LONG,
StudentName STRING,
DeptNo INTEGER,
Salary DECIMAL(18, 4)
)
USING DELTA
location '/mnt/landing/student_Idnot_Exist_sparksql'

Method 3: Using Dataframe

PySpark allows you to create Delta tables using DataFrames. You can define the schema and properties using the DataFrame API and then write the DataFrame to a Delta table.

First, Create the dataframe based on required data set

student_data = [(1,"Pubudu",10,10000),
(2,"Suranga",20,20000),
(3,"Dewagama",30,30000)
]

student_schema = ["studentId","StudentName","DeptNo","Salary"]

df = spark.createDataFrame(data=student_data, schema = student_schema)

display(df)

Then, create the delta table based on the dataframe that we created earlier.

df.write.format("delta").saveAsTable("Student_DF_Table")

Display data of Delta Tables

Once you have data in your Delta table, you can use PySpark to display the contents. This helps you verify the data and check for any issues in your table.

Select all the records of the delta table — Spark SQL

%sql
select * from Student_DF_Table

Select all the records of the delta table — PySpark

df = spark.sql("select * from default.Student_DF_Table")
display(df)

Describe Delta Tables

Describing Delta tables involves retrieving metadata and statistics about the structure of the table. This information can include column names, data types, and various statistics like the number of distinct values.

Delta Lake table describe

%sql
DESCRIBE Student_DF_Table

Delta Lake table describe full details

%sql
DESCRIBE DETAIL Student_DF_Table

Drop Delta Tables

Dropping a Delta table means deleting the entire table and its data. This is useful when you no longer need the table or want to recreate it with different specifications.

%sql
DROP TABLE Student_DF_Table;
%sql
DROP TABLE IF EXISTS Student_DF_Table;

List Delta Tables

You can use PySpark to list all Delta tables within a specified database or location. This helps in managing and keeping track of the available Delta tables.

List all tables from landing schema

%sql
SHOW TABLES from landing;

List all tables in default schema

%sql
SHOW TABLES;

Rename the columns of the Delta Tables

If you need to modify the structure of your Delta table, you can rename columns using PySpark. This operation involves altering the schema of the Delta table.

spark.read.table("default.Student_DF_Table")\
.withColumnRenamed("StudentName","Name")\
.withColumnRenamed("Salary","Monthly_Salary")\
.write.format("delta").mode("overwrite").option("overwriteSchema",True).saveAsTable("default.Student_DF_Table")
df = spark.sql("select * from default.Student_DF_Table")
display(df)

Roll back delta table to previous version

Delta tables support versioning, allowing you to roll back to a specific version of the data. This is useful in scenarios where you need to revert to a previous state of the table.

These steps collectively provide a comprehensive overview of working with Delta tables roll back operation in a Datalake.

%sql
DESCRIBE HISTORY default.Student_DF_Table
%sql
delete from default.Student_DF_Table where studentId = 1
%sql
DESCRIBE HISTORY default.Student_DF_Table
%sql
RESTORE TABLE default.Student_DF_Table TO VERSION AS OF 1
df = spark.sql("select * from default.Student_DF_Table")
display(df)

I’ve prepared a comprehensive YouTube video that walks you through each step with detailed examples. If you encounter any uncertainties, feel free to watch the video for clarification.

Thank you !!!

--

--