Reading and Writing Data from/to MinIO using Spark

Dineshvarma Guduru
1 min readOct 3, 2021

MinIO is a cloud object storage that offers high-performance, S3 compatible. Native to Kubernetes, MinIO is the only object storage suite available on every public cloud, every Kubernetes distribution, the private cloud and the edge. MinIO is software-defined and is 100% open source. MinIO is like s3 but hosted locally.

If you don’t have MinIO setup in your machine, follow this blog to setup MinIO in Mac.

Let’s first add the library Dependency for MinIO

"io.minio" % "spark-select_2.11" % "2.1"

The above dependency will allow us to read the csv file formats using minioSelectCSV. We can also read files with other formats like parquet, avro etc.. without using the above dependency.

Let’s initialise the spark session and add the configuration to connect to minio.

With the above configuration, We are good to connect to MinIO. Now it will be just spark read and write as shown below.

The complete code block will look as below.

--

--