An SQL Editor for Apache Spark SQL with Livy

Published in

Data Querying

2 min readApr 27, 2020

December 2020 Update: Spark SQL queries with the SparkSql Thrift Server instead of Livy

Spark SQL

Spark SQL is convenient for embedding clean data querying logic within your Spark apps. Hue brings an Editor so that it is easier to develop your SQL snippets.

As detailed in the documentation, Spark SQL comes with different connectors. Here we will just show with Livy.

Apache Livy provides a bridge to a running Spark interpreter so that SQL, pyspark and scala snippets can be executed interactively. Livy as originally created within the Hue project in order to offer a REST API for easily interacting with Spark.

In the hue.ini configure the API url:

[spark] 
# The Livy Server URL. 
livy_server_url=http://localhost:8998

And as always, make sure you have an interpreter configured:

[notebook] 
[[interpreters]] 
[[[sparksql]]] 
name=Spark SQL 
interface=livy

And that’s it, the editor will appear:

One advantage of using Hue is its File Browser for HDFS / S3 / Azure and full security (Kerberos and even using the real user credentials via Knox IdBroker integration).

Here are some of the future improvements:

Database/table/column autocomplete is currently empty
SQL grammar autocomplete can be extended
SQL Scratchpad module to allow a mini SQL Editor popup is in progress

Romain

Originally published at https://gethue.com.

An SQL Editor for Apache Spark SQL with Livy

Spark SQL

Written by Romain Rigaux