Running Apache Spark on EKS with AWS Spot Instances

Pavan Kumar
Nerd For Tech
Published in
8 min readAug 7, 2021

--

Effective cost-saving for Apache Spark workloads on EKS with AWS Spot Instances

Apache Spark is a data processing framework that can perform rapid processing tasks on huge datasets and can distribute the data processing tasks across multiple nodes. With the rapid containerization of applications, organizations have also started running spark on a containerized platforms like Kubernetes and have also been using them in production. Ideally, there are two ways of running spark on Kubernetes.

a) Spark Submit: This is a single script used to submit a spark program and launch the Kubernetes cluster application.

Example:

b) Spark Operator: The spark operator allows Spark applications to be defined in a declarative manner and supports one-time Spark applications with SparkApplication and cron-scheduled applications with ScheduledSparkApplication.

Example:

--

--

Pavan Kumar
Nerd For Tech

Senior Cloud DevOps Engineer || CKA | CKS | CSA | CRO | AWS | ISTIO | AZURE | GCP | DEVOPS Linkedin:https://www.linkedin.com/in/pavankumar1999/