How to Debug Queries by Just Using Spark UI

You already have the thing you need to debug a query

Cinto
The Startup

--

Spark is the most widely used big data computation engine, capable of running jobs on petabytes of data. Spark provides a suite of web user interfaces (UIs) that you can use to monitor the status and resource consumption of your Spark cluster. Most of the issues that we encounter while running a job can be debugged by heading to the spark UI.

spark2-shell --queue=P0 --num-executors 20Spark context Web UI available at http://<hostname>:<port>
Spark context available as 'sc'
Spark session available as 'spark'

In this document, I will try to showcase how to debug a spark job just by using the Spark UI. I will run a few Spark jobs and show how the Spark UI reflects the run of the job. I will also add some tips and tricks along the way

This is how a Spark UI looks like

We will start with the SQL tab, which encompasses a lot of info to do an initial review. If using RDDs, you may not see the SQL tab in some cases.

Here is a query I ran for reference

spark.sql("select id, count(1) from table1 group by id”).show(10, false)

--

--

Cinto
The Startup

An engineer, a keen observer, writer about tech, life improvement, motivation, humor, and more. Hit the follow button if you want a weekly dose of awesomeness.