Package PySpark job dependencies for GCP Dataproc

Learn how to package dependencies for your PySpark job running on the GCP Dataproc cluster

Aman Ranjan Verma
Towards Data Engineering

--

Google Cloud Dataproc is a managed cloud service that makes it easy to run Apache Spark and other popular big data processing frameworks on Google Cloud Platform (GCP). With Dataproc, you can create and manage Spark clusters quickly and easily, without having to worry about the underlying infrastructure.

--

--

Aman Ranjan Verma
Towards Data Engineering

Senior Data engineer, QuillBot | Ex-Flipkart | Ex-Sigmoid. I publish weekly. Available for 1:1 at topmate.io/arverma