Docker Image for Polynote, Jupyter and Other Data Science Tools

Shariful Islam
Nov 13, 2019 · 8 min read

Netflix has just released an open-source multipurpose notebook named Polynote,

with first-class Scala support, Apache Spark integration, multi-language interoperability including Scala, Python, and SQL, as-you-type autocomplete, and more,

as they claim. According to the article written on Netflix Technology Blog (probably by the authors themselves) just two weeks ago, it provides data scientists and machine learning researchers with a notebook environment which provides the freedom to seamlessly integrate JVM-based ML platform with the more popular Python based ecosystem.

Image for post
Image for post
https://www.docker.com/

I am not here to advocate for or against Polynote. To be honest, I am not in a position to do so, as I have barely used it (I just heard about it a week ago). As a data scientist, I am use Jupyter-notebook extensively for my everyday work. Moreover, the type of work I do these days mostly demands an expertise in the Python ecosystem. However, it is always exciting to try some new tool and give it a fair chance.

That’s enough for an introduction. As I have already mentioned, I am not here to promote the new tool. Rather my intention here is to share my effort about creating an environment with all the necessary and sub-necessary tools I want to use without switching between multiple environments each time I want to play with something.

Anybody can go to the official site and follow the instructions to install and have fun with the new toy. They even provide an official docker image from where anyone can run it without going through all the hassles. However, what I wanted is a Docker image, which contains kind of everything I need (at least for most of the time). I need Jupyter, I need some data science library, and it would be great if I have the ability to experiment with the new tool (polynote), where I can learn and use Scala, Spark and get myself familiar with the JVM-based ML platform without switching between multiple environments and docker images.

source: https://polynote.org/

Creating Multi-purpose Docker Image

This repository provides all the necessary files to build a docker image for running Polynote, Jupyter-notebook, Scala, Spark and a few more data science and machine learning tools.

The docker image is also uploaded to Dockerhub, so that one can use the image without going through all the hassles.Docker image can be found here.

In the next few section we will discuss how can we build image from Dockerfile. We will also show how to create container from created of downloaded image.

Create Docker Image from Dockerfile

To create image from the Dockerfile, we need to clone this repository and then change directory to the cloned repository. We need to select a name for our image (for example dock_polynote:v1 in this case). Then we need to run the following command on the terminal (this has been tested on Ubuntu 18.04 LTS and MacOS, hasn't been tested on Windows). If we want to remove the intermediate images, we should use --rm.

$ docker build \
--rm \
-t dock_polynote:v1 \
-f Dockerfile \
.

Depending on the speed of the network, this will take quite a while to complete. Upon successful completion, we should be able to see the image by using docker images command in the terminal, as shown below.

$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
dock_polynote v1 6e1bf15182bf 45 hours ago 3.91GB
ubuntu xenial 5f2bf26e3524 12 days ago 123MB

Build Container from Dockerhub Image

Step 1:
Download the image using the following command on the terminal (tested on Ubuntu 18.04 LTS and MacOS). Instead of the latest tag, we can also use some older tag.

$ docker pull sharifuli/polynote-jupyter-scala-spark-ds:latest

The size of the image is almost 2GB, so it will take some time to download.

Step 2:
Now, we can create a Docker container using the downloaded image or the image we created directly from Dockerfile. We need to remember the following issues:

  • We need to provide a name for the container we are going to create (for example docker_polynote in this case).
  • We need to forward ports for accessing Jupyter-notebook and Polynote from our browser. In this case we are using 8192 for Polynote and 7777 for Jupyter-notebook. We can use any unused port for these purposes.
  • We need to set an environmental variable PYSPARK_ALLOW_INSECURE_GATEWAY
  • We can mount any local directory to the docker container. In this case we are mounting home ~ directory to /opt/home_mounted on the container. We can mount any (one or more) directory to the container.
  • We can use --rm if we want to remove the container after exit. If we want a persistent container, we need to remove the --rm tag.
  • We need to change the name of the docker image from sharifuli/polynote-jupyter-scala-spark-ds:latest to dock_polynote:v1 if we want to create container from created image instead of downloaded image. The command is shown below:
$ docker run \
--name docker_polynote \
-it --rm \
-p 8192:8192 \
-p 7777:8888 \
-p 4040:4040 \
-e PYSPARK_ALLOW_INSECURE_GATEWAY=1 \
-v ~:/opt/home_mounted \
sharifuli/polynote-jupyter-scala-spark-ds:latest

This should start both Polynote and Jupyter-notebook server.

Access Bash

We can access to Docker bash (when the container is running) by using the following command on the terminal. In the terminal we can perform any necessary operation as shown below.

$ docker exec -it docker_polynote bash
root@83ed7c8eb6ac:/opt# ls
home_mounted nohup.out polynote setup_files spark-2.4.4-bin-hadoop2.7
root@83ed7c8eb6ac:/opt#

Access Polynote

You should be able to access Polynote from the browser using the following link:

Ploynote on localhost browser

Access Jupyter

You should be able to access Jupyter-notebook from the browser using the following link:

Jupyter-notebook on local server

If we did not change anything so far, the default password would be abcdef.

How to Change Jupyter-notebook password:
We can change the Jupyter-notebook password by following one of the following methods.

  • Change config/jupyter_notebook_config.py before creating Docker Image, then create Image from the file and create container from the image.
  • Change ~/jupyter_notebook_config.py file after creating Docker container. For this, we need to get into Docker bash terminal and then change the jupyter_notebook_config.py file in the home directory. The commands are shown below.
$ docker exec -it docker_polynote bash
root@83ed7c8eb6ac:/opt# cd ~ # change directory to home
root@83ed7c8eb6ac:~# ls -a # .jupyter is in home directory
. .. .bashrc .cache .jupyter .local .profile .wget-hsts
root@83ed7c8eb6ac:~# ls .jupyter/
jupyter_notebook_config.py
root@83ed7c8eb6ac:~# cat .jupyter/jupyter_notebook_config.py
# this is for passwrd [abcdef] - change hash for new password
# to generate new hash:
# Step 1: type ipython in the terminal
# Step 2: in the ipython terminal type:
# from notebook.auth import passwd
# Step 3: then in the ipython terminal type:
# passwd()
# Step 4: now follow the prompt and copy the generated password in this file
# it looks like:
# In [1]: from notebook.auth import passwd
# In [2]: passwd()
# Enter password:
# Verify password:
# Out[2]: 'sha1:3dfb9c306198:6a917376957053aefb43ef79e5c8b405d2eb7669'
c.NotebookApp.password = u'sha1:e10b54ea7f07:dc33e226e4afc2e0e0aa1d9700864b261753bba4'
root@83ed7c8eb6ac:~#

The procedure to change password is provided in jupyter_notebook_config.py as comments. Say, for example we want to set a new password abc123. We should open a terminal and type ipython, this will open an ipython kernel. We should just follow the steps shown below:

root@83ed7c8eb6ac:~# ipython
Python 3.6.8 (default, Oct 7 2019, 12:59:55)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.8.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from notebook.auth import passwd

In [2]: passwd()
Enter password:
Verify password:
Out[2]: 'sha1:caa85ffb276c:f0cd6a90553a35970519ae8718755dffa3d2525e'

In [3]:

If we want to change the password before building the image, we can follow the exact same procedure for creating a password has using any ipython console and then pasting the generated hash into the jupyter_notebook_config.pyfile.

Now we need to copy the created hash and replace the existing hash in the file and then kill the existing Jupyter-notebook and start a notebook again. We can kill Jupyter-notebook using the following command:

# kill $(ps -aux | grep [j]upyter-notebook | awk '{print $2}' | head -n 1)

Now we can restart the Jupyter-notebook using the following command from the Docker bash.

$ docker exec -it docker_polynote bash
root@83ed7c8eb6ac:/opt# nohup jupyter-notebook --no-browser --allow-root --ip=0.0.0.0
nohup: ignoring input and appending output to 'nohup.out'
root@83ed7c8eb6ac:/opt#

Now if we try to access the notebook from browser it will work with the new password.

Access Scala

We can access Scala from the bash terminal using the scala command as shown below:

$ docker exec -it docker_polynote bash
root@83ed7c8eb6ac:/opt# scala
Welcome to Scala 2.11.12 (OpenJDK 64-Bit Server VM, Java 11.0.4).
Type in expressions for evaluation. Or try :help.

scala> object HelloWorld {
def main(args: Array[String]): Unit = {
println("Hello, world!")
}
}
defined object HelloWorld

scala> :q # quit
root@83ed7c8eb6ac:/opt#

Access Spark

We can run PySpark by running pyspark on the terminal.

root@83ed7c8eb6ac:/opt# pyspark
Python 2.7.15+ (default, Oct 7 2019, 17:39:04)
[GCC 7.4.0] on linux2
... ...
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.4
/_/

Using Python version 2.7.15+ (default, Oct 7 2019 17:39:04)
SparkSession available as 'spark'.
>>> from pyspark.sql import SparkSession

We can also submit any Spark Job using the spark-submit command. For this purpose, lets login bash and cd to the spark_example, as shown in the code below.

root@83ed7c8eb6ac:/opt# cd setup_files/spark_example/
root@83ed7c8eb6ac:/opt/setup_files/spark_example# spark-submit --master local[4] SimpleApp.py
19/11/13 19:51:17 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
... ...
19/11/13 19:51:25 INFO DAGScheduler: Job 1 finished: count at NativeMethodAccessorImpl.java:0, took 0.077107 s




Lines with a: 112, lines with b: 69




19/11/13 19:51:25 INFO SparkUI: Stopped Spark web UI at http://83ed7c8eb6ac:4040
... ...
19/11/13 19:51:25 INFO ShutdownHookManager: Deleting directory /tmp/spark-daaa128a-02a7-4885-bdad-37636c14fa2e
root@83ed7c8eb6ac:/opt/setup_files/spark_example#

Change Docker Port

We can change the port for Jupiter by editing the configuration file on ~/.jupyter. You can do the same for Polynote by editing the file /opt/polynote/polynote/config.yml.

That’s all for now. I will update the post and the repository if anything comes to my mind. Enjoy!

The Startup

Medium's largest active publication, followed by +773K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store