Using Apache Zeppelin with SQL Server

Mike Moritz
3 min readMar 24, 2019

--

Not the Hindenburg

Jupyter Notebook is a great tool for a range of data use cases. However, often there are times when using a language besides Python would be easier. Additionally, many large enterprise datasets are still housed on MS SQL Servers, and improved native connectivity would be a plus.

Enter Apache Zeppelin.

Zeppelin introduces the concept of interpreters, which allows for a single notebook to contain many languages. Out of the box it contains support for an impressive array of languages including Python, Shell, Apache Spark, Pig, and Hive. A generic JDBC interpreter can also connect to most of the common relational databases.

zeppelin.apache.org

As an added benefit, the UI is much more powerful than Jupyter. It is easy to add flexible layout of panels (called paragraphs), form inputs, and use different visualization platforms (i.e. not only those available in Python). Credential management is also treated as a first-class citizen, which makes it easier to share notebooks with an audience in a safe way.

Installing Apache Zeppelin (hint: use Docker)

Before running the Docker image we will setup the driver for SQL Server. Zeppelin uses JDBC, and therefore we have two options: the official driver from Microsoft and the open source driver jTDS. We will choose jTDS for the important reason that it allows for using Windows Authentication on a non-Windows machine (CentOS 7 in this example). If your database supports SQL authentication this difference will not matter.

  1. Download the driver from http://jtds.sourceforge.net/
  2. Create a top level directory to contain the project and an artifacts directory
  3. Unzip the jTDS driver and note the location of the jar
mkdir /opt/zeppelin /opt/zeppelin/artifacts
unzip jtds-1.3.1-dist.zip -d /opt/zeppelin/artifacts/jtds-1.3.1
find /opt/zeppelin/artifacts/jtds-1.3.1 -name *.jar

Navigate to the project directory (/opt/zeppelin in this case) and execute the following docker run command. It will download the image if not available locally.

docker run -p 8080:8080 --rm \
-v $PWD/logs:/logs \
-v $PWD/notebook:/notebook \
-v $PWD/artifacts:/artifacts \
-v $PWD/conf/interpreter.json:/zeppelin/conf/interpreter.json \
-e ZEPPELIN_LOG_DIR='/logs' \
-e ZEPPELIN_NOTEBOOK_DIR='/notebook' \
--name zeppelin apache/zeppelin:0.8.0

There are two additional volume mounts in this command compared to the standard configuration:

$PWD/artifacts will allow for accessing jTDS driver

$PWD/conf/interpreter.json will allow for persistence of custom interpreter configurations (otherwise they will be ephemeral with the container).

Configuring the interpreter

Go to localhost:8080 to launch the UI and then navigate to the Interpreters page (under top-right dropdown).

Click the “+Create” button, add a name for the new interpreter (e.g. mssql), and select jdbc as the interpreter group. The panel will change to a list of properties to modify, and make the modifications described below:

Note: Some blank properties dropped for convenience

For Windows Authentication from Linux the key addition is to add the domain parameter into the connection string. By providing this, and a username and password, it tells jTDS to use NTLM instead of SQL authentication.

To check that everything is working, create a new note and select mssql as the default interpreter. Once in the note you can select the gear icon near the top right which will show all of the interpreters configured for the note. Click into the first paragraph, write a test query, and then hit the play icon or shift+enter to run.

%mssql is technically not required since you set it as the default interpreter, but this shows the mechanism for defining different languages by paragraph.

That’s it!

It’s worth noting that as of version 0.8.0 it is not possible to save the output of JDBC paragraphs, nor share data between interpreters (with a few exceptions). However, I imagine that Zeppelin will continue to have rich features added in the future.

--

--