Beginner’s Configuration Guide for Spark (IBM Analytics Engine)

Wendy Wang

Published in

IBM Data Science in Practice

5 min readSep 11, 2018

By Wanting Wang, IBM Data Science Elite Team
Ricardo Balduino, IBM Data Science Elite Team

Introduction

The IBM Analytics Engine (IAE) provides a flexible framework to develop and deploy analytics applications on Hadoop and Spark. It allows you to spin up Apache Hadoop and Apache Spark clusters and manage them through their lifecycle.

In this blog, we focus on tips for configuring Spark clusters, which can be tricky to configure. In our experience, the default settings don’t work very well (at least, for sparklyr — an R interface for Apache Spark), and changing any of Spark’s many parameters at random often only makes things worse.

Mathematics for Spark Configuration

Some knowledge about the basic mathematics behind the Spark configuration will help you adjust the numbers for your specific task.

Tip 1: The real amount of memory you can allocate is less than the memory you provisioned. See Table 1 for recommended memory cap.

Note: AE hardware configuration can be found here: https://console.bluemix.net/docs/services/AnalyticsEngine/index.html#introduction

Where does the vanishing memory go? To find out, we need to look at the IAE architecture.

For the driver node, the management host contains 3 containers: mn001, mn002, and mn003. 2GB is allocated for OS. Mn001 contains the knox gateway and is allocated 2GB memory. Mn002 contains other apache components like HDFS NameNode, yarn ResourceManager, Livy server, Spark history server etc. so that 6GB is allocated to this container. That leaves around 6GB out of 16GB for mn003 on a standard AE cluster. The Jupyter Kernel Gateway runs in mn003 along with the Jupyter kernels and spark driver. When using a Jupyter notebook, the driver runs on the management node, which runs a few other components like Jupyter Kernel Gateway, Livy Server, Knox gateway etc. — all of which consume memory. As a result, the maximum memory available for the driver is up to 4GB in a standard AE cluster and 110GB in a memory-intensive AE cluster.

For worker nodes, about 2GB is given to the OS, so on Standard nodes that leaves around 14GB for Yarn in which Spark executors are run, and on memory-intensive nodes around 126GB is available for Yarn, or Spark executors.

The available memory for worker nodes can be found in the Ambari dashboard, in Yarn memory, which shows the total available memory and memory usage for all your worker nodes in the AE cluster.

**Tip 2: Memory per executor * number of executor instances = total memory allocated to the executors in your Spark session**

If you set memory per executor to 3GB and the number of executor instances to 15, you’re going to allocate 3GB * 15 = 45GB memory to your executors. Let’s consider two situations:

2-node standard cluster: the available memory for the workers is 14GB * 2 = 28GB.
6-node standard cluster: the available memory for the workers is 14GB * 6 = 84GB.

In the first case, you’re trying to allocate more memory than is available, so Spark will automatically adjust the memory size allocated, but some operations will not be aware of this real limitation and attempt to use more memory, as you indicated in the configuration — and eventually cause a memory outage issue.

In the second case, you’re trying to allocate far less memory than is available, which should work if overhead memory also fits. (See Tip 3 and Tip 4.) Clearly you are not fully utilizing AE in one spark session because there is much memory left and it won’t be used. This is not always bad. Imagine you have multiple users or multiple notebooks to run, and you know the memory usage for each will not be intensive. In that case, restricting the memory usage of each Spark session could prevent one session from using too much memory.

**Tip 3: Overhead memory per executor = 0.1 * memory per executor**

The memory you allocate to executor is the main part of the memory it’s going to use. Apart from that, your executor also robs you of some memory for memory overhead. By default, it will take an additional 10% of the memory you allocate to it. This can be changed in the Spark configuration when the Spark session is initiated.

**Tip 4: (Memory per executor + overhead memory per executor) * number of executor instances = total memory allocated to your Spark session ≤ total memory available for all your worker nodes in the cluster**

When you calculate the memory usage that your executors will finally eat up, you have to take overhead memory into account or change the memoryOverhead config, otherwise you will not be able to allocate as much memory to your executors as you intend.

If you set the memory per executor to 3GB and the number of executor instances to 14, you’re going to allocate 3GB * 14 = 42GB memory to your executors. The real memory they’re going to take up will be around (3GB + 0.3GB) * 14 * 14 = 46.2GB. Let’s consider two situations:

3-node standard cluster: the available memory for the workers is 14GB * 3 = 42GB.
5-node standard cluster: the available memory for the workers is 14GB * 5 = 70GB.

In the first case, the available memory is equal to the sum of the memory you indicated to allocate to executors, but the overhead memory isn’t taken into consideration. That means there is not enough memory for you to allocate practically over 46GB memory, and in the end the memory allocated to each executor will be smaller than the number you indicate in config and could cause memory outage issues.

In the second situation, the available memory is large enough to fit your config, but for one Spark session, that’s not an ideal setting because a third of the memory is wasted.

Tip 5: There’s a trade-off between executor memory and Garbage Collector (GC) — larger executor memory, higher potential of GC issue; smaller executor memory, lower potential of GC issue

Whether to increase the executor memory when the available memory is large is a tradeoff. (For small clusters like a 3-node standard AE, you don’t need to worry about it, because it’s better not to increase the executor memory.)

Some issues we’ve seen were actually caused by small executor memory. For example, when we trained a 200 tree, 10 max depth random forest model with 3GB of executor memory, we always ended up with an error after a long time of execution. This issue could be solved by increasing the executor memory to 6GB. Meanwhile, we’re also aware of the GC issue which can typically be relieved by decreasing the executor memory.

As far as we learned, for any ML stages other than model building, a small executor memory like 3GB is safe, and the model building stage will depend on the setting.