Debugging Spark Code Locally Like A Boss

Anup Vasudevan
Analytics Vidhya
Published in
6 min readJan 4, 2021

--

https://flic.kr/p/sbEHMt

Stepping through Spark internals can be helpful. If anything, it helps make sense of what your code is doing under the hood. In this post, I’m going to explain how I set up my debugger to hit breakpoints within the Spark codebase to be able to debug Spark Scala, Java and Python code.

The Prep

Setting up Spark locally involves downloading Hadoop binaries and Spark source code. I will be using IntelliJ for this tutorial but in theory, any IDE that lets you run and debug JVM based languages should do. This is being set up on a Macbook Pro but there’s no reason this wouldn’t work on Windows. I don’t have access to a PC so some exploration might be needed on your part for that.

What you’ll need

  • You need Java 11 installed. Any LTS or version after 8 should do.
  • Maven 3.6.3 or above installed.
  • Scala 2.12.10 or above installed.
  • Python 3 installed.
  • Minimum 8 GB of RAM.
  • Jetbrains IntelliJ IDE. I use the Ultimate version but this should work fine on the Community version.
  • Coffee or drink of choice because you are going to need it.

The Setup

--

--

Anup Vasudevan
Analytics Vidhya

Hi there! I’m a software developer based out of Boston, MA. Worked at DSW, T-Mobile, HMH, Wellington Management, Thesys CAT and Autodesk to name a few.