3 Tips for Building a Lasting Jupyter Server

Hank Chan
Hank Chan
Feb 23 · 4 min read

There have been many tutorials for building your own Jupyter server, but not enough focus on building a lasting one, that is — how to ensure Jupyter runs even after ssh connection breaks up or system reboots.

That is the tragic story I bet many ML engineers have been quite familiar with after precious hours of trainings had gone to waste. Here we are going to summarize three popular solutions that had been tested on Ubuntu (Let me know if any CentOS user runs into problem). This post assumes you have experience with spinning up a Jupyter server.

Tmux

The easiest solution comes to rescue. We get that not all machine learning researchers have experience with server provisioning — that is why we recommend tmux as our first pick. Both Ubuntu and CentOS come with tmux out of the box, so there’s no need for sudo to install. What I love about tmux is it doesn’t require writing a configuration file, nor does it need to manage deep learning dependencies — something both AWS and GCP have images built for, with DLAMI and GCP DLVM respectively.

Using tmux is as easy as it gets — start a tmux session, and run a Jupyter server in it. So long as you don’t exit the tmux session, the server will continue running, even after your ssh connection has broken up.

# Start a tmux session
$ tmux
# Run a jupyter server
$ jupyter notebook
# Leave tmux session without exiting the running jupyter server
Ctrl+b + d
# Check if jupyter server is running
$ ps aux | grep jupyter
# Reenter tmux session
$ tmux attach
# Exit tmux session while you are in it
$ exit

Downside: tmux doesn’t survive system reboot. But the fact it survives unstable ssh connection should be enough for most use cases that don’t want to lose their work in progress.

Supervisor

supervisor can be slightly more complicated to set up, but it beats tmux on one point — it restarts the specified program after system reboot, so the program could seemingly run forever, in this case, the Jupyter server.

# Install supervisor
$ sudo apt install supervisor
$ sudo service supervisor start

To ask supervisor to monitor and control your Jupyter server, we need to first create a configuration file and place it in the /etc/supervisor/conf.d. Below is an example of the configuration file (which must be named with .conf extension):

[program:jupyter] 
command = jupyter notebook --no-browser --config=/path/to/config
directory = /path/to/working/directory
user = ubuntu # or whoever
autostart = true
autorestart = true
stdout_logfile = /var/log/your_log_file.log
redirect_stderr = true

After saving the configuration file to the directory /etc/supervisor/conf.d, you can ask supervisor to read the configuration file and start the program:

$ sudo supervisorctl reread 
$ sudo supervisorctl update
$ sudo supervisorctl status # Check if Jupyter is running

Downside: The need for writing a damn config file lol, though supervisor can also be super helpful for other automation tasks.

p.s. The supervisor section of the post takes reference from Albert Yang’s Post. I do not claim credit for it. Albert’s post has a more detailed walkthrough for setting up supervisor as well as using nginx as reverse proxy, which provides routing capability for the server.

Docker

Enter DevOps’ favorite choice — Docker. Docker’s advantage may not be best reflected in AWS’ DLAMI and GCP’s DLVM for the dependencies has been taken care of. That said, Docker will still come in handy when you have to start off with a plain server environment. Almost all vendor-provided Linux images today come with Docker pre-installed, including DLAMI and DLVM; even if it doesn’t, there are many tutorials for installing docker (See here to install Docker on Ubuntu 18.04).

There are many approaches for using Docker in deep learning, but here we are only concerned with running a lasting Jupyter server.

First we pull a pre-built deep learning image from Docker Hub, and run it at port 8888. For the sake of simplicity, we are pulling the Tensorflow image from the official Jupyter repository on Docker Hub.

$ docker pull jupyter/tensorflow-notebook
$ docker run -d -p 8888:8888 -e JUPYTER_ENABLE_LAB=yes -v "$PWD":/home/ubuntu jupyter/tensorflow-notebook

Most docker images have their own setups, this one included too. Since the container’s working directory doesn’t have our Jupyter notebook folder, we need to symlink the folder to the working directory.

# Find your container name
$ docker ps
# Enter into the container
$ docker exec -it your_container_name bash
# Inside the docker container, symlink the folder to the working directory
$ ln -s /your/jupyter/notebook/folder /home/jovyan/whatever

Voila! A Jupyter server is now serving your folder at port 8888.

Downside: 1) Docker container wouldn’t survive system reboot either. 2) You are likely going to need different docker images for different dependencies, and each image size can be huge (~5GB). 3) Third-party images come at a cost — the more it customizes, the less applicable is to general use. Of course you could build your own docker image, or even build image per project (Check out repo2docker), but that’s outside the scope of this tutorial.

Summary

So here it goes — tmux > supervisor > Docker in easiness, tmux < supervisor < Docker in ability to customize. Even though we present these tools in separate approaches, they can sometimes be used together — in fact, my personal favorite is to use docker with supervisor.

[program:docker]
command = docker container start your_container_name
directory = /home/ubuntu
user = ubuntu
autostart = true
startsecs = 1
startretries = 0
exitcodes=0
stdout_logfile = /var/log/your_log.log
redirect_stderr = true

The caveat here is supervisor is designed to run a running job, not a one-time job. Therefore, we need to specify startretries = 0and exitcodes = 0 to tell supervisor to stop issuing retries after command had exited, although this feels more like a workaround to me.

The three solutions above are by no means the only three for spinning up a lasting Jupyter server. If you find other ways to implement or other tools that are just as convenient, please share with us.

Cubesole

Mobilize the world with AI

Hank Chan

Written by

Hank Chan

deepdrop.ai & cubesole.ai

Cubesole

Cubesole

Mobilize the world with AI

More From Medium

Also tagged Jupyter Notebook

Also tagged Jupyter Notebook

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade