Running PySpark Applications on Amazon EMR

Methods for Interacting with PySpark on Amazon Elastic MapReduce

Published in

The Startup

21 min readDec 2, 2020

Introduction

According to AWS, Amazon Elastic MapReduce (Amazon EMR) is a Cloud-based big data platform for processing vast amounts of data using common open-source tools such as Apache Spark, Hive, HBase, Flink, Hudi, and Zeppelin, Jupyter, and Presto. Using Amazon EMR, data analysts, engineers, and scientists are free to explore, process, and visualize data. EMR takes care of provisioning, configuring, and tuning…

Running PySpark Applications on Amazon EMR

Methods for Interacting with PySpark on Amazon Elastic MapReduce

Introduction

Written by Gary A. Stafford