Streamline your Data Science Experience with Jupyter Hub
Improve your workflow with Jupyter Hub, ipykernel and systemd
As an avid user of Jupyter Notebooks and/or Lab, this guide shows how to automatically start-up Jupyter server allowing you to select your desired Python environment using the web interface and then start coding.
If like me, your typical workflow was to start a terminal, activate your desired Python virtual environment before running
jupyter notebook or
jupyterlab, then this guide should provide an easier and more integrated experience. Its achieved by installing
jupyterhub into a dedicated virtual environment, setting up access to multiple iPython kernels and using a
systemd start-up script (for relevant Linux distributions) to launch
jupyterhub at system start-up.
This setup is not recommended for servers (which already have dedicated guides depending on the number of parallel users); furthermore the guide is for Linux; Mac and Windows users can follow a similar approach with the exception of automatically starting Jupyter Hub.
Like many Data Scientists, I utilise the best practice of creating dedicated, controlled and known Python environments, using tools like
conda. Similarly, I’m an avid use of the Jupyter ecosystem, in particular Jupyter Notebooks and sometimes Jupyter Lab. My typical workflow is represented by the top branch in the diagram below:
The bottom branch is the new way of working I have adopted recently, and although on the surface it looks like saving a single step, in reality for someone engaging with Jupyter regularly the improvement in the experience is more significant. The end result is an environment where Jupyter Hub is always running in the background and therefore both Jupyter Notebook and Lab are available on-demand. Furthermore, I do not explicitly activate a particular
conda environment as it’s selectable and changeable directly from the web interface.
I appreciate much of this can be achieved without actually using Jupyter Hub itself, but I find it much easier when all the elements are combined together.
Finally, a number of users will either be running Jupyter in the cloud or a server (either local or remote) to which there are dedicated guides available. The key difference is getting Transport Layer Security (TLS) certificates to verifiably encrypt communication between the client and the server:
Jupyter Hub Set-up
The guide assumes you have a working
conda environment either with Miniconda or Miniforge using
mamba, which is my recommendation:
This repository holds a minimal installer for Conda specific to conda-forge. It is comparable to Miniconda, but with…
Miniconda - Conda documentation
Miniconda is a free minimal installer for conda. It is a small, bootstrap version of Anaconda that includes only conda…
Enterprise / commercial users please note the license change from Anaconda regarding using the
main channel for
conda from August 2020, which may restrict the channels you can access.
For regular readers the following setup should be familiar but generally self-explanatory. So in your preferred terminal type the following:
Given the imminent introduction to Python 3.10, I have bumped the Python version I use to the latest 3.8 version (3.8.8 at the time of writing) and also the popular
jupyter_contrib_extensions package including
jupyterhub at this stage should show the following:
The page can be accessed via http://127.0.0.1:8000 (assuming port 8000 was free) and then use your system login details i.e. use the same username and password you use to login into your Linux distribution.
When creating a new notebook, we have an unhelpful “Python 3” display entry, where it’s unclear which
conda environment it refers to. The display is similar with Jupyter Lab:
If it’s unclear to you, then the “Python 3” actually refers to the
jupyter environment created in this guide. The next step is to update the display name and make other
conda environments accessible from the same Jupyter instance.
Jupyter Python Kernels
When you install Jupyter by default it installs the
ipython kernel, but restricted to within the current
conda environment (obviously). The following commands show the current accessible kernels and all the environments on the system:
The snippet above shows that only a single kernel is available in the environment, compared to the five
conda environments on the system in total.
Rename Existing Kernels
The first step is to rename the existing kernel to something easier to recognise. The properties and metadata for each kernel are contained in a dedicated file called
kernel.json. Line 5 in the above snippet shows the path to the file which contains the following:
Line 9 contains the
display-name, which should be changed to “Jupyter (py3.8)” in a text editor or via command line to the following:
Refreshing either Jupyter Notebook or Lab will show the updated description:
The screenshot shows the updated
display-name for the existing kernel, which is now much clearer as to the environment (
Jupyter) and version of Python (
We want to use the
conda environment (and its Jupyter Hub installation) to access other
conda environments directly from within the web interface. This is detailed in the ipython documentation for access to multiple kernels and demonstrated below:
The example above shows the activation of the
conda environment and the installation of the
ipython kernel in user space for the environment (
display-name “ds01 (py3.7)”. The is also now reflected within Jupyter Notebook / Lab:
It should be noted that the shared kernels extend beyond creating a new notebook. An existing notebook based on one kernel can be switched to using a different kernel after it has been loaded meaning that it’s possible for a single project to use multiple environments from a single interface. For example, you can have one environment dedicated to visualisations, another to deep learning and another for documentation etc.
It should also be noted that there is currently no custom sorting option for how the environments are displayed. The default is in alphabetical order, so if you have a preference you an change the order of the environments by using numeric prefixes such as
Having set-up the desired easy to use
jupyter environment, the next step is to automatically start Jupyter Hub.
Autoload Jupyter Hub with Systemd
This step is restricted to Linux distributions using
systemd and based on this great Stackoverflow answer with the addition of the
How to convert a python script in a local conda env into systemd service in Linux?
Thanks for contributing an answer to Stack Overflow! Please be sure to answer the question. Provide details and share…
As a bit of background there are multiple hurdles to overcome in getting
jupyterlab to start on system start-up wth
systemd. Firstly, there is one layer of abstraction with
systemd followed by another layer due to
conda itself. With much trial-and-error these steps permit (at least for Ubuntu 20.04) for the correct
conda environment to be utilised for
jupyterhub to autoload. The required
systemd service file is:
<username> should be replaced with your username. In essence there are two parts to loading Jupyter Hub at system start:
- Constructing the right service file with the correct commands
- Running the service as
Constructing the Service File
The two difficult components to constructing the service file were setting the correct
PATH variable and setting the correct environmental variables.
The required command for
/bin/bash -c 'PATH=/home/<username>/miniconda3/envs/jupyter/bin:$PATH exec jupyterhub'
It contains the absolute path to the shell (
/bin/bash) and a command that updates the
PATH variable to reflect the location of
conda and its binaries, followed by running the
The second part is passing the correct environmental variables; these can be obtained by running the
The list needs to be converted into a space delimited, single line list instead of a newline delimited list for
systemd. These are then copied across into the service file for the
Enabling and Starting Systemd Service
systemd its possible to run without
root privileges by placing the service file as
~/.config/systemd/user/. The following commands reloads the list of service files, check it’s status and starts the process for testing purposes:
The first command ensures
systemd loads the service file from the user directory. The status command confirms there are no formatting or parameter errors (which are listed as additional messages in red). To test the service file the
start command allows the user to start the service and can be checked by visiting either http://127.0.0.1:8000 or http://localhost:8000. Running the
status command again shows the output of running
These steps confirm that the service file works as intended. To ensure the service starts at the system start i.e. when the laptop starts, the service has to be explicitly enabled. The final test is of course to restart the machine and after logon check that Jupyter Hub is indeed running.
Configure Jupyter Notebook Extensions
There are a number of poplar extensions to the original Jupyter Notebook, which I would recommend. I wont cover this exhaustively as there are many guides available. To access the extensions page, simply click on the
I would recommend the following extensions (you can read about each on by clicking on it):
- Help Panel — yes you can press Shift Tab but this provides a good backup
- Live Markdown Preview — view the markdown render as you type
- ExecuteTime — the most useful feature is knowing the run time of a command
- Scratchpad — press CTRL+B to get a temporary environment to run locals for example
- Skip-Traceback — make error messages more manageable
- Autopep8 — easily format code cells; I use it primarily for cleaning up manually typed lists with the hammer button
- Table of Contents — can display a side menu of all the headers in a notebook
This guide presents three components to enable a better Jupyter Notebook or Lab experience. The first step was the installation of Jupyter Hub into a dedicated
conda environment. The second step was renaming the existing
kernel to be more obvious and sharing other kernels from different
conda environments with appropriate
display-name. The final step is activating the new
conda environment and running
jupyterhub automatically using
systemd. As a bonus, some basic extensions were also recommended for Jupyter Notebook.
The end result is an easy to use web based environment that allows easy switching between