EXPEDIA GROUP TECHNOLOGY — SOFTWARE

Part 3: Cost Efficient Executor Configuration for Apache Spark

Find the most cost efficient executor configuration for your node

Brad Caffey
Expedia Group Technology
9 min readAug 11, 2020

--

An image representing a control for efficiency
Photo by Sashkin on Shutterstock

If you like this blog, please check out my companion blogs about how to use persist and coalesce effectively to lower job cost and improve performance.

CPUs per node

The first step to determine an efficient executor config is to figure out how many actual CPUs (i.e. not virtual CPUs) are available on the nodes in your cluster. To do so, you need to find out what type of EC2 instance your cluster is using. For our discussion here, we’ll be using r5.4xlarge which according to the AWS EC2 Instance Pricing page has 16 CPUs.

When we submit our jobs, we need to reserve one CPU for the operating system and the Cluster Manager. So we don’t want to use all 16 CPUs for the job, instead we want to allocate 15 available CPUs to use on each node during Spark processing.

16 CPUs, with 15 available for Spark and one allocated to OS and cluster manager

CPUs per executor

Now that we know how many CPUs are available to use on each node, we need to determine how many Spark cores we want to assign to each executor. From basic math (X * Y= 15), we can see that there are four different executor & core combinations that can get us to 15 Spark cores per node:

4 combinations of Executors x cores/executor = 15, ie 1x15, 3x5, 5x3, 15x1
Possible configurations for executor

Lets explore the feasibility of each of these configurations.

One executor with 15 cores

Node has 15 cores for 1 executor and one core for the OS and cluster manager

The most obvious solution that comes to mind is to create one executor that has 15 cores. The problem with large fat executors like this one is that an executor supporting this many cores typically will have a memory pool so large (64GB+) that garbage collection delays would slow down your job unreasonably. So we’ll rule out this config for an executor.

Fifteen executors with 1 core

Node has 15 executors, one on each core, plus one core for OS and cluster manager

The next solution that comes to mind would be to create 15 executors that have just one core each. The problem here is that single core executors are inefficient because they don’t take advantage of the parallelism that multiple cores within an executor enable. In addition, finding the optimal amount of overhead memory for single core executors can be difficult. Let’s talk about memory overhead for a moment.

Total memory used by executor = executor memory + overhead

Executor overhead memory defaults to 10% of your executor size or 384MB (whichever is greater). However, on some big data platforms like Qubole, overhead defaults to a fixed amount regardless of your executor size. You can confirm what overhead value is being used by looking in the Environments tab of your Spark log and looking for spark.executor.memoryOverhead parameter.

The Spark default overhead memory value will be really small which will cause problems with your jobs. On the other hand, a fixed overhead amount for all executors will result in overhead memory being too large and therefore leave less room for executors. Finding the perfect overhead memory size can be difficult so this is another reason why a single core executor is not ideal either.

Five executors with 3 cores or three executors with 5 cores

Two configurations: node with  3 executors each with 5 cores; and 5 executors each 3 cores; plus one core for cluster manager

So we have two options left. The consensus in most Spark tuning guides is that 5 cores per executor is the optimum number of cores in terms of parallel processing. And I have found this to be true from my own cost tuning efforts as well. Another benefit to using 5 core executors over 3 core executors is that fewer executors on your node means less overhead memory consuming node memory. So we’ll choose 5 core executors to minimize overhead memory on the node and maximize parallelism within each executor.

--executor-cores 5

Memory per Node

Our next step is to determine how much memory to assign each executor. Before we can do this, we must determine how much physical memory is available to us on our node. This is important because physical memory is a hard cap for your executors. If you know what EC2 Instance you are using then you know the total memory available on the node. In the case of instance type r5.4xlarge, AWS says it has 128GB of memory available.

Node has 128GB total

However, you will not have all 128GB available to use for your executors as memory will need to be set aside for your Cluster Manager. The below gif shows where to look in YARN’s Resource Manager to find out how much memory is available to use after memory is set aside for the Cluster Manager.

On the Hadoop manager page, select active nodes and look at the memory column to find the memory available for executors

We can see that the nodes on this cluster have 112GB available for executors to use.

Node has 112GB for executors and 16GB for OS and cluster manager

Memory per Executor

If we want three executors consuming the 112GB of available memory, then we can determine what the efficient memory size will be for each executor. To calculate our executor memory amount, we divide available memory by 3 to get total executor memory. Then we subtract overhead memory and round down to the nearest integer.

If you have fixed overhead memory (as is the case with Qubole), then you will use this formula. (112/3) = 37–2.3 = 34.7 = 34.

If you use Spark’s default method for calculating overhead memory, then you will use this formula. (112/3) = 37 / 1.1 = 33.6 = 33.

For the remainder of this guide, we’ll use the fixed amount of memory overhead from Qubole.

--executor-memory 34G

Here’s the paradigm shift that will be required for experienced Spark tuners to realize cost savings. I’m recommending that you use this fixed memory size and core count in your executors for all jobs. I get it…using a fixed executor config for most Spark jobs seems antithetical to proper Spark tuning practices. If you are skeptical then I ask that you try this strategy out firsthand to see if it works. Figure out how to calculate costs for your jobs as described in Part 2 and then use those costs to confirm what really works. I believe if you do this you’ll find that the only way to achieve cloud spending efficiency is to use fixed memory sizes for your executors that achieve optimal CPU utilization.

With that said, if you have a large amount of unused memory in your executors when using the efficient memory size then consider switching your process to run on a different EC2 instances type that has less memory per node CPU. These instances will be cheaper and therefore help reduce the cost for your job.

Finally, there will be times when this cost efficient configuration will not provide enough bandwidth in your executor for your data. In the examples provided on Part 2, there were several jobs where I had to deviate from the efficient memory size because memory utilization was maxed the entire job run.

For the purposes of this guide, I’m recommending that you start with the efficient memory size when converting your jobs. If you have memory errors with the efficient executor configuration, I will share tweaks later in Part 5 that will eliminate those errors.

Executors per job

Now that we have our executor configured, we are ready to configure how many executors we want for our job. Remember that our goal is to make sure all available 15 CPUs per node are being utilized which means we want to have three executors assigned to every node. If we configure our executor count in multiples of 3’s we will make sure this happens.

Each node has 3 executors

There is one problem with this configuration though. Our driver also needs to be assigned to a node to handle all the executors. If we use an executor count that’s a multiple of 3, then our single core driver will be assigned to its own 16 core node which means 14 cores on that last node will be unused during the entire job. That’s not good cloud spending utilization!

4 nodes: 3 nodes have 3 executors each, the 4th node has only a driver

The lesson learned here is that the ideal executor count is in multiples of 3 minus one executor to make room for our driver.

3 nodes, 2 of the nodes has 3 executors, the 3rd node has 2 executors and the driver
--num-executors (3x - 1)

In Part 4, I will give a recommendation for how many executors you should use when converting an existing job to a cost efficient executor.

Memory per Driver

The common practice among data engineers is to configure driver memory relatively small compared to the executors. However, AWS actually recommends sizing your driver memory to be the same as your executors. I have found this to be very helpful with cost tuning as well.

--driver-memory 34G

In rare instances there will be times when you need a driver whose memory is larger than the executor. In these cases, set the driver’s memory size to 2x of the executor memory and then use (3x - 2) to determine the number of executors for your job.

Cores per Driver

The default core count for drivers is one. However, I’ve found that jobs using more than 500 Spark cores can experience a performance benefit if the driver core count is set to match the executor core count. Don’t change the core count in your driver by default though. Just test it with your larger jobs to see if you experience a performance benefit as well.

--driver-cores 5

One size fits all?

So the executor config I’m recommending for a node with 16 cpus and 128GB of memory will look like this.

--driver-memory 34G --executor-memory 34G --num-executors (3x - 1) --executor-cores 5

But remember…

One size does not fit all. Keep trying and eventually you will find the perfect fit

Like I mentioned above, this config may not seem suitable to your needs. I’m recommending that you use this config as a starting point in your cost tuning process. If you have memory issues with this config then in later parts of this guide I will recommend tweaks you can use that will resolve the common memory issues that arise when switching to the cost efficient configuration.

Since the node configuration used on this page is pretty common at Expedia Group™, I will be referring to it throughout the rest of the guide. If your nodes are a different size then you should follow the method I laid out here to calculate your ideal configuration.

Now that you have an efficient executor config to work with, you are ready to convert your current jobs to the new config. But which jobs should you prioritize tuning first? And how many executors should you run with this new configuration? And what happens if a cost tuned job runs longer than an untuned job? And is overutilizing node CPUs ever appropriate? I answer these questions in Part 4: How to Migrate Existing Apache Spark Jobs to Cost Efficient Executor Configurations.

Series contents

Learn more about technology at Expedia Group

--

--