Bridging the Gap Between Python and Scala Jupyter Notebooks
Using the PixieDust Python helper library to import Scala packages
There’s a reason you’ve been hearing a lot about data science notebooks lately: data scientists are in high demand, and the Python programming language is widely used. In particular, Jupyter Notebooks are a popular tool for creating and sharing code for quick analysis.
Most Jupyter Notebooks you’ll see come in two main flavors: Python and Scala. While Python is great for collaborating with colleagues — clean syntax, tons of handy libraries, good documentation — sometimes you need the processing power of Scala. The Apache Spark data processing engine is built on Scala, so if you’re working with a big data set, a Scala Jupyter notebook is the way to go. The downside of Scala is that fewer people know it.
Hello, [Scala] world! From [Python] PixieDust
David Taieb has led the charge for our team to build an open source application that we affectionately call PixieDust. The PixieDust Python helper library works as an add-on to your Jupyter notebook that lets you do all sorts of new things, like automatic chart rendering or progress monitors for cells running code. PixieDust can also help developers bridge contexts: call Scala code from a Python Notebook, or call Python code from a Scala Notebook.
With this post, I’d like to demonstrate how to use PixieDust to import a Scala “Hello, world!” package into a Jupyter Python notebook. I’m going use the same code that was used in this article by Dustin V:
Part 1 : How to add a custom library to a Jupyter Scala notebook in IBM Data Science Experience…
I have been using IBM’s Data Science Experience platform for a few months now. Its a great platform to perform data…
I’ll show you how to use the same Scala JAR that Dustin provides, but from within a Jupyter Python notebook instead.
Prepare to notebook!
Here are the basic steps:
- Set up an account in IBM Data Science Experience (DSX)
- Create a project in DSX
- Point to Dustin’s Scala JAR
- Test Scala JAR from Python Notebook with PixieDust
1. Set up an account in IBM Data Science Experience (DSX)
Browse to http://datascience.ibm.com/ and sign up for a free trial. You’ll get a 30-day free trial that includes Jupyter and other tools. (You will need to provide a personal email address for the account). This step should take about 10–15 minutes to complete.
2. Create a project in DSX
When your DSX account is ready, it’s time to create a new project:
Name your project (mine is called “pixiedust”), and use the defaults for the Spark and Object Storage instances. You’ll add a notebook in just a moment.
3. Point to Dustin’s Scala JAR
It’s optional, but you can refer to Dustin V’s post for steps on how to compile your own JAR file. Otherwise, he makes the JAR available for test purposes, and I found that it works just fine. You’ll see this URL in our sample notebook:
4. Test Scala JAR from Python Notebook with PixieDust
You can now use our sample notebook to test PixieDust’s Python-Scala bridge functionality. To run the notebook in your own account, first download it via the universal download arrow icon:
Head back to the DSX project you created in step 2, and add a notebook to your project:
Choose the From File option. Name your notebook (mine is “HelloWorld”). Now, create your notebook.
The notebook has all the configuration and sample code you’ll need. Just run the cells using the play icon, and make sure to restart your kernel when prompted in the cell output to avoid errors.
You will know that everything is working when you see PixieDust generate a chart at the end of your notebook. Now you know that Scala and Python can be BFFs with PixieDust and Jupyter Notebooks!
If you enjoyed this article, please ♡ it to recommend it to other Medium readers.