Spark on major Cloud Providers — Part 1

Google Cloud Platform — GCP

Creating the Cluster

Creating a cluster on GCP Dataproc
Cluster created

Submitting a Job to the Cluster

click on the terminal icon to open the cloud shell
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('my-test-app').getOrCreate()
df = spark.createDataFrame([
(1, "a"),
(2, "b"),
(3, "c"),
], ["ID", "Text"])
gcloud config set dataproc/region us-central1
gcloud dataproc jobs submit pyspark — cluster=my-awesome-cluster
A Job submitted and completed from the command line




Abhishek Singh

