Spark on major Cloud Providers — Part 1

Google Cloud Platform — GCP

Creating the Cluster

Photo by Nareeta Martin on Unsplash
Creating a cluster on GCP Dataproc
Cluster created

Submitting a Job to the Cluster

click on the terminal icon to open the cloud shell
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('my-test-app').getOrCreate()
df = spark.createDataFrame([
(1, "a"),
(2, "b"),
(3, "c"),
], ["ID", "Text"])

df.show()
gcloud config set dataproc/region us-central1
gcloud dataproc jobs submit pyspark — cluster=my-awesome-cluster test-app.py
A Job submitted and completed from the command line

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How to Find the Best App Company to Partner With for Your Company

Really common issues integrating from front part 1 (CORS + Preflight Request )

Let us Get Started With Continuous Delivery in 5 minutes using GitLab

LeetCode Ranking = 564,579. (April 13th)

Important Tips to Considering For Ruby On Rails Performance

Types of Automated Testing Frameworks

LightBlue Bean and iOS — A Tutorial

DefaultDict Tutorial : HackerRank Solution in Python

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Abhishek Singh

Abhishek Singh

More from Medium

Football Match Prediction Using Machine Learning In Real-Time

Deep dive into Big Data with Hadoop (Part 2): Hadoop Architecture

Quick guide to Apache Pig

Getting started with Apache Spark II