Running Apache Spark on EKS with AWS Spot Instances
Effective cost-saving for Apache Spark workloads on EKS with AWS Spot Instances
Apache Spark is a data processing framework that can perform rapid processing tasks on huge datasets and can distribute the data processing tasks across multiple nodes. With the rapid containerization of applications, organizations have also started running spark on a containerized platforms like Kubernetes and have also been using them in production. Ideally, there are two ways of running spark on Kubernetes.
a) Spark Submit: This is a single script used to submit a spark program and launch the Kubernetes cluster application.
Example:
b) Spark Operator: The spark operator allows Spark applications to be defined in a declarative manner and supports one-time Spark applications with SparkApplication
and cron-scheduled applications with ScheduledSparkApplication
.
Example: