How to run Scala and Spark in the Jupyter notebook

Bogdan Cojocar
2 min readJun 25, 2018

The Jupyter notebook is one of the most used tools in data science projects. It’s a great tool for developing software in python and has great support for that. It can also be used for scala development with the spylon-kernel. This is an additional kernel that has to be installed separately.

Step1: install the package

pip install spylon-kernel

Step2: create a kernel spec

This will allow us to select the scala kernel in the notebook.

python -m spylon_kernel install

Step3: start the jupyter notebook

ipython notebook

And in the notebook we select New -> spylon-kernel . This will start our scala kernel.

Step4: testing the notebook

Let’s write some scala code:

val x = 2
val y = 3
x+y

The output should be something similar with the result in the left image. As you can see it also starts the spark components. For this please make sure you have SPARK_HOME set up.

Now we can even use spark. Let’s test it by creating a dataset:

val data = Seq((1,2,3), (4,5,6), (6,7,8), (9,19,10))
val ds = spark.createDataset(data)…

--

--

Bogdan Cojocar

Big data consultant. I write about the wonderful world of data.