Running PySpark Applications on Amazon EMR

Methods for Interacting with PySpark on Amazon Elastic MapReduce

Gary A. Stafford
The Startup
Published in
21 min readDec 2, 2020

--

Introduction

According to AWS, Amazon Elastic MapReduce (Amazon EMR) is a Cloud-based big data platform for processing vast amounts of data using common open-source tools such as Apache Spark, Hive, HBase, Flink, Hudi, and Zeppelin, Jupyter, and Presto. Using Amazon EMR, data analysts, engineers, and scientists are free to explore, process, and visualize data. EMR takes care of provisioning, configuring, and tuning…

--

--

Gary A. Stafford
The Startup

Area Principal Solutions Architect @ AWS | 10x AWS Certified Pro | Polyglot Developer | DataOps | GenAI | Technology consultant, writer, and speaker