Bridging the Gap Between Python and Scala Jupyter Notebooks

Using the PixieDust Python helper library to import Scala packages

There’s a reason you’ve been hearing a lot about data science notebooks lately: data scientists are in high demand, and the Python programming language is widely used. In particular, Jupyter Notebooks are a popular tool for creating and sharing code for quick analysis.

Most Jupyter Notebooks you’ll see come in two main flavors: Python and Scala. While Python is great for collaborating with colleagues — clean syntax, tons of handy libraries, good documentation — sometimes you need the processing power of Scala. The Apache Spark data processing engine is built on Scala, so if you’re working with a big data set, a Scala Jupyter notebook is the way to go. The downside of Scala is that fewer people know it.

Hello, [Scala] world! From [Python] PixieDust

David Taieb has led the charge for our team to build an open source application that we affectionately call PixieDust. The PixieDust Python helper library works as an add-on to your Jupyter notebook that lets you do all sorts of new things, like automatic chart rendering or progress monitors for cells running code. PixieDust can also help developers bridge contexts: call Scala code from a Python Notebook, or call Python code from a Scala Notebook.

With this post, I’d like to demonstrate how to use PixieDust to import a Scala “Hello, world!” package into a Jupyter Python notebook. I’m going use the same code that was used in this article by Dustin V:

I’ll show you how to use the same Scala JAR that Dustin provides, but from within a Jupyter Python notebook instead.

Prepare to notebook!

Here are the basic steps:

  1. Set up an account in IBM Data Science Experience (DSX)
  2. Create a project in DSX
  3. Point to Dustin’s Scala JAR
  4. Test Scala JAR from Python Notebook with PixieDust

1. Set up an account in IBM Data Science Experience (DSX)

Browse to http://datascience.ibm.com/ and sign up for a free trial. You’ll get a 30-day free trial that includes Jupyter and other tools. (You will need to provide a personal email address for the account). This step should take about 10–15 minutes to complete.

2. Create a project in DSX

When your DSX account is ready, it’s time to create a new project:

Name your project (mine is called “pixiedust”), and use the defaults for the Spark and Object Storage instances. You’ll add a notebook in just a moment.

3. Point to Dustin’s Scala JAR

It’s optional, but you can refer to Dustin V’s post for steps on how to compile your own JAR file. Otherwise, he makes the JAR available for test purposes, and I found that it works just fine. You’ll see this URL in our sample notebook:

https://github.com/dustinvanstee/dv-hw-scala/raw/master/target/scala-2.10/dv-hw-scala-assembly-1.0.jar

4. Test Scala JAR from Python Notebook with PixieDust

You can now use our sample notebook to test PixieDust’s Python-Scala bridge functionality. To run the notebook in your own account, first download it via the universal download arrow icon:

Download our sample notebook before uploading and running it in your own DSX project.

Head back to the DSX project you created in step 2, and add a notebook to your project:

Choose the From File option. Name your notebook (mine is “HelloWorld”). Now, create your notebook.

The notebook has all the configuration and sample code you’ll need. Just run the cells using the play icon, and make sure to restart your kernel when prompted in the cell output to avoid errors.

Using PixieDust in a Python Notebook to access a custom Scala Package.

You will know that everything is working when you see PixieDust generate a chart at the end of your notebook. Now you know that Scala and Python can be BFFs with PixieDust and Jupyter Notebooks!

A Python matplotlib chart generated by PixieDust on a Scala Spark DataFrame.

If you enjoyed this article, please ♡ it to recommend it to other Medium readers.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Maureen McElaney

Maureen McElaney

OSS DevRel at @IBM . board @vtTechAlliance . my words don’t represent my employer. (http://pronoun.is/she/:or/they)