“spark-sight” Shows Spill: Skewed Data and Executor Memory

Alfredo Fomitchenko
3 min readJun 7, 2022

--

Photo by Thom Holmes on Unsplash

spark-sight is a less detailed, more intuitive representation of what is going on inside your Spark application in terms of performance.

This story is part of a series:

  1. Part 1: Meet “spark-sight”: Spark Performance at a Glance
  2. Part 2: This story

When you launch your Spark application, do you ever wonder

  1. Whether it suffers from skewed data?
  2. Whether the value of spark.executor.memory is right?

Wonder no more!

spark-sight v0.1.8 adds a new chart for spill information:

The middle chart shows, for each executor, when and how much the executor has spilled to disk in that time interval.

And maybe, with the help of spark-sight, one day you will get to see this:

Ok, I need it NOW

$ pip install spark-sight>=0.1.8

Why care about spill?

I’ll let this wonderful story guide you

Understanding common Performance Issues in Apache Spark — Deep Dive: Data Spill

Simply put, when one of the partitions is too large to fit in the executor memory, the disk is accessed and performance decreases by orders of magnitude.

A real-world scenario

This chart is so helpful because it gives you a lot of information at a glance.

To illustrate this, the following is a real-world Spark application I analyzed with spark-sight:

First half: spark.executor.memory is too low

First half of the spill chart shows that

  • all executors are spilling
  • the executors are spilling equal amounts

This is a clear indication that the parameter spark.executor.memory is too low: in fact, all executors are equally struggling to keep in memory their partitions.

Second half: skewed data

Second half of the spill chart shows that

  • the executors are spilling different amounts

For example, during stage 180, executor 7 has spilled three times more than the other executors.

This is a clear indication of skewed data. In fact, if you open stage 180 in the Spark UI, you will see the following:

What’s next?

Let me know in the comments what I should address next:

  • Adding a chart for peak memory usage per executor
  • Converting the current figure into a full-fledged Dash (Plotly) application to improve the UX

What’s next for you?

--

--