How to run Scala and Spark in the Jupyter notebook
The Jupyter notebook is one of the most used tools in data science projects. It’s a great tool for developing software in python and has great support for that. It can also be used for scala development with the spylon-kernel. This is an additional kernel that has to be installed separately.
Step1: install the package
pip install spylon-kernel
Step2: create a kernel spec
This will allow us to select the scala kernel in the notebook.
python -m spylon_kernel install
Step3: start the jupyter notebook
ipython notebook
And in the notebook we select New -> spylon-kernel
. This will start our scala kernel.
Step4: testing the notebook
Let’s write some scala code:
val x = 2
val y = 3x+y
The output should be something similar with the result in the left image. As you can see it also starts the spark components. For this please make sure you have SPARK_HOME
set up.
Now we can even use spark. Let’s test it by creating a dataset:
val data = Seq((1,2,3), (4,5,6), (6,7,8), (9,19,10))
val ds = spark.createDataset(data)…