3 Tips for Building a Lasting Jupyter Server

Hank Chan
Hank Chan
Feb 23, 2020 · 4 min read

There have been many tutorials on building your own Jupyter server, but few focuses on building a lasting one, that is — how to ensure Jupyter continues to run even after ssh connection breaks up or system reboots.

ML engineers have been cursing the unstable ssh connections that their precious hours of training had gone to waste because of it. Here we are going to summarize three popular solutions for a forever Jupyter server that had been tested on Ubuntu (Let me know if any CentOS user runs into problem). This post assumes you have experience spinning up a Jupyter server.

Tmux

The easiest solution comes to rescue. We get that not all machine learning researchers have experience with server provisioning — that is why I recommend tmux as our first pick. Both Ubuntu and CentOS come with tmux out of the box, so there’s no need for sudo to install. What I love about tmux is it doesn’t require writing a configuration file, nor does it need to manage deep learning dependencies — something both AWS and GCP have images built for, with DLAMI and GCP DLVM respectively.

Using tmux is as easy as it gets — start a tmux session, and run a Jupyter server in it. So long as you don’t exit the tmux session, the server will continue running, even after your ssh connection has broken up.

Downside: tmux doesn’t survive system reboot. But the fact it survives unstable ssh connection should be enough for most use cases that’d otherwise risk losing their work in progress.

Supervisor

supervisor can be slightly more complicated to set up, but it beats tmux at one point — it restarts the specified program after system reboot, so the program could seemingly run forever, in this case, our Jupyter server.

To task supervisor to run your Jupyter server, we need to first create a configuration file and place it in the /etc/supervisor/conf.d. Below is an example of that configuration file (which must be named with .conf extension):

After saving the configuration file to the directory /etc/supervisor/conf.d, don’t forget to ask supervisor to read in the configuration file and kickstart the program:

Downside: The need for writing a damn config file lol, though supervisor can also be super helpful for other automation tasks.

p.s. The supervisor section of the post takes reference from Albert Yang’s Post. I do not claim credit for it. Albert’s post has a more detailed walk-through for setting up supervisor as well as using nginx as reverse proxy, which provides easier solutions for routing and SSL connections.

Docker

Enter DevOps’ favorite choice — Docker. Docker’s advantage may not be best reflected in AWS’ DLAMI and GCP’s DLVM for the dependencies has been taken care of. That said, Docker will still come in handy when you have to start off with a plain server environment. Almost all vendor-provided Linux images today come with Docker pre-installed, including DLAMI and DLVM; even if it doesn’t, there are many tutorials for installing docker (See here to install Docker on Ubuntu 18.04).

There are many approaches for using Docker in deep learning, but here we are only concerned with running a lasting Jupyter server.

First we pull a pre-built deep learning image from Docker Hub, and run it at port 8888. For the sake of simplicity, we are pulling the Tensorflow image from the official Jupyter repository on Docker Hub.

Most docker images have their own setups, this one included. Since the container’s working directory doesn’t have our Jupyter notebook folder, we need to symlink the folder to the working directory.

Voila! A Jupyter server is now serving your folder at port 8888.

Downside:

  1. Docker container wouldn’t survive system reboot either.
  2. You are likely going to need different docker images for different dependencies, and each image size can be huge (~5GB).
  3. Third-party image comes at a cost — the more it customizes, the less applicable to general use. Of course you could build your own docker image, or even build image per project (Check out repo2docker), but that’s outside the scope of this tutorial.

Summary

So here it goes — tmux > supervisor > Docker in easiness, tmux < supervisor < Docker in ability to customize. Even though we present these tools in separate approaches, they can sometimes be used together — in fact, my personal favorite is docker in conjunction with supervisor.

The caveat here is supervisor is designed to run a running job, not a one-time job. Therefore, we need to specify startretries = 0and exitcodes = 0 to tell supervisor to stop issuing retries after command had exited, though this feels more like a workaround to me.

The three solutions above are by no means the only three for spinning up a lasting Jupyter server. If you find other ways to implement or other tools that are just as convenient, please share with us.

Mobilize the world with AI