Apache Zeppelin, Interpreter mode explained

Moon
3 min readNov 10, 2016

--

Apache Zeppelin is a web-based notebook that enables interactive data analytics. Interpreter is a pluggable layer for backend integration. More than 20 interpreters available in the official Zeppelin distribution package, and many more available as a 3rd party projects.

Interpreter is a JVM process that communicates to Zeppelin daemon using thrift. Each Interpreter process can have Interpreter Groups, and each interpreter instance belongs to this Interpreter Group.

Interpreter Process, Group and Instance

See here to understand more about its internal structure. Zeppelin provides 3 different modes to run interpreter process: shared, scoped and isolated.

In Shared mode, single JVM process and single Interpreter Group serves all Notes.

Shared mode

In Scoped mode, Zeppelin still runs single interpreter JVM process but multiple Interpreter Group serve each Note. So, each Note have their own dedicated session but still it’s possible to share objects between different Interpreter Groups while they’re in the same JVM process.

Scoped mode

Isolated mode runs separate interpreter process for each Note. So, each Note have absolutely isolated session.

Isolated mode

Each Interpreter implementation may have different characteristics depending on the back end system that they integrate. And 3 interpreter modes can be used differently.

Let’s take a look how Spark Interpreter implementation uses these 3 interpreter modes, as an example. Spark Interpreter implementation includes 4 different interpreters in the group: Spark, SparkSQL, Pyspark and SparkR. SparkInterpreter instance embeds Scala REPL for interactive Spark API execution.

In Shared mode, a SparkContext and a Scala REPL is being shared among all interpreters in the group. So every Note will be sharing single SparkContext and single Scala REPL. In this mode, if NoteA defines variable ‘a’ then NoteB not only able to read variable ‘a’ but also able to override the variable.

Shared mode in Spark Interpreter

In Scoped mode, each Note has its own Scala REPL. So variable defined in a Note can not be read or overridden in another Note. However, still single SparkContext serves all the Interpreter Groups. And all the jobs are submitted to this SparkContext and fair scheduler schedules the job. This could be useful when user does not want to share Scala session, but want to keep single Spark application and leverage its fair scheduler.

In Isolated mode, each Note has its own SparkContext and Scala REPL.

We took a look at 3 different interpreter mode that Zeppelin provides and how Spark Interpreter implementation leverages each mode.

These 3 modes give flexibility to fit Zeppelin into any type of use cases. I expect these modes will be even more useful when it combines with multi-tenancy support in Zeppelin (ZEPPELIN-1337, work in progress) in near future.

Resources
* Interpreters in Apache Zeppelin
* Writing a new interpreter
* How Zeppelin runs a Paragraph

--

--

Moon

Creator of Apache Zeppelin and cloud platform for open-source projects https://staroid.com.