3 Tips for Building a Lasting Jupyter Server
There are many tutorials on building your Jupyter server, but few talks about building a lasting one, that is — how to ensure Jupyter continues to run even after ssh connection breaks up or system reboots.
ML engineers have on more than one occasion wasted hours of training because of the unstable ssh connections. Here we are going to summarize three solutions for a forever live Jupyter server (if you can afford one) that had been tested on Ubuntu (Let me know if any CentOS user runs into problem). This post assumes you have experience spinning up a Jupyter server.
Tmux
The easiest solution comes to rescue. We get that not all machine learning researchers have experience with server provisioning — that is why I recommend tmux
as our first pick. Both Ubuntu and CentOS come with tmux
out of the box, so there’s no need for sudo
to install. What I love about tmux
is it doesn’t require writing a configuration file, nor does it need to manage deep learning dependencies — something both AWS and GCP have images built for, with DLAMI and GCP DLVM respectively.
Using tmux
is as easy as it gets — start a tmux
session, and run a Jupyter server on it. So long as you don’t exit the tmux
session, the server will continue running, even after your ssh connection has broken up.
# Start a tmux session
$ tmux# Run a jupyter server
$ jupyter notebook# Leave tmux session without exiting the running jupyter server
Ctrl+b + d# Check if jupyter server is running
$ ps aux | grep jupyter# Reenter tmux session
$ tmux attach# Exit tmux session while you are in it
$ exit
Downside: tmux
doesn’t survive system reboot. But the fact it survives unstable ssh connection should be enough for most use cases that’d otherwise risk losing their work in progress.
Supervisor
supervisor
can be slightly more complicated, but it beats tmux
at one point — it restarts the specified program after system reboot, so the program could seemingly run forever.
# Install supervisor
$ sudo apt install supervisor
$ sudo service supervisor start
To task supervisor
to run your Jupyter server, we need to first create a configuration file and place it in the /etc/supervisor/conf.d
. Below is an example of that configuration file (which must be named with .conf
extension):
/etc/supervisor/conf.d/jupyter.conf[program:jupyter]
command = jupyter notebook --no-browser --config=/path/to/config
directory = /path/to/working/directory
user = ubuntu # or whoever
autostart = true
autorestart = true
stdout_logfile = /var/log/your_log_file.log
redirect_stderr = true
After saving the configuration file to the directory /etc/supervisor/conf.d
, don’t forget to ask supervisor
to read in the configuration file and kickstart the program:
$ sudo supervisorctl reread
$ sudo supervisorctl update
$ sudo supervisorctl status # Check if Jupyter is running
Downside: The need for writing a damn config file lol, though supervisor
can also be super helpful for other automation tasks.
p.s. The supervisor section of the post takes reference from Albert Yang’s Post. I do not claim credit for it. Albert’s post has a more detailed walk-through for setting up supervisor
as well as using nginx
as reverse proxy, which provides easier solutions for routing and SSL connections.
Docker
Enter DevOps’ favorite choice — Docker. Docker’s advantage may not be best reflected in AWS’ DLAMI and GCP’s DLVM for the dependencies has been taken care of. That said, Docker will still come in handy when you have to start off with a plain server environment. Almost all vendor-provided Linux images today come with Docker pre-installed, including DLAMI and DLVM; even if it doesn’t, there are many tutorials for installing docker (See here to install Docker on Ubuntu 18.04).
There are many approaches for using Docker in deep learning, but here we are only concerned with running a lasting Jupyter server.
First we pull a pre-built deep learning image from Docker Hub, and run it at port 8888. For the sake of simplicity, we are pulling the Tensorflow image from the official Jupyter repository on Docker Hub.
$ docker pull jupyter/tensorflow-notebook
$ docker run -d -p 8888:8888 -e JUPYTER_ENABLE_LAB=yes -v "$PWD":/home/ubuntu jupyter/tensorflow-notebook
Most docker images have their own setups, this one included. Since the container’s working directory doesn’t have our Jupyter notebook folder, we need to symlink the folder to the working directory.
# Find your container name
$ docker ps# Enter into the container
$ docker exec -it your_container_name bash# Inside the docker container, symlink the folder to the working directory
$ ln -s /your/jupyter/notebook/folder /home/jovyan/whatever
Voila! A Jupyter server is now serving your folder at port 8888.
Downside:
- Docker container wouldn’t survive system reboot either.
- You are likely going to need different docker images for different dependencies, and each image size can be huge (~5GB).
- Third-party image comes at a cost — the more it customizes, the less applicable to general use. Of course you could build your own docker image, or even build image per project (Check out repo2docker), but that’s outside the scope of this tutorial.
Summary
So here it goes — tmux
> supervisor
> Docker
in easiness, tmux
< supervisor
< Docker
in ability to customize. Even though we present these tools in separate approaches, they can sometimes be used together — in fact, my personal favorite is docker
in conjunction with supervisor
.
/etc/supervisor/conf.d/docker.conf[program:docker]
command = docker container start your_container_name
directory = /home/ubuntu
user = ubuntu
autostart = true
startsecs = 1
startretries = 0
exitcodes=0
stdout_logfile = /var/log/your_log.log
redirect_stderr = true
The caveat here is supervisor
is designed to run a running job, not a one-time job. Therefore, we need to specify startretries = 0
and exitcodes = 0
to tell supervisor
to stop issuing retries after command had exited, though this feels more like a workaround to me.
The three solutions above are by no means the only three for spinning up a lasting Jupyter server. If you find other ways to implement or other tools that are just as convenient, please do share with us.