Spark Series #6 : Data Ingestion From Files

5 min readNov 18, 2023

https://www.sothebys.com/en/buy/auction/2022/art-impressionniste-et-moderne-day-sale/le-cabinet-anthropomorphique

In Spark, data is transient, meaning that once the Spark engine is shut off, all the data in memory is lost. Since Spark operates on in-memory processing, data is only available while the Spark session is active. Whether you are building an ETL pipeline or conducting analysis, data ingestion is the first crucial step.

Spark Series #6 : Data Ingestion From Files

Written by Aruna Das