Load a partitioned delta file in PySpark

PrashantShukla
1 min readApr 25, 2023

--

To load a partitioned Delta file in PySpark, you can use the DeltaTable API provided by Delta Lake. Here's an example code snippet:

from delta.tables import DeltaTable
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("DeltaTableExample").getOrCreate()# Load the Delta table as a DataFrame
delta_table = DeltaTable.forPath(spark, "/path/to/partitioned_delta_table")
# Read specific partitions of the table
df = delta_table.read().where("date >= '2022-01-01' and date <= '2022-01-31'")
# You can also pass the partition filters as options to the `read` method
df = spark.read.format("delta").option("basePath", "/path/to/partitioned_delta_table") \
.option("mergeSchema", "true") \
.option("partitionFilters", "date >= '2022-01-01' and date <= '2022-01-31'") \
.load("/path/to/partitioned_delta_table")

In the first example, we use the DeltaTable.forPath method to load the Delta table as a DataFrame. We can then use the where method to apply filters on specific partitions of the table.

In the second example, we use the spark.read method with the Delta format and pass the partition filters as options to the load method. Note that we also specify the base path of the Delta table using the basePath option, and set mergeSchema to true to automatically merge the schema of the Delta table with the schema of the DataFrame.

--

--