Simplest Way to publish your Jupyter notebooks on the open web: Using jupyter-book and GitHub Pages

Jung Hoon Son, M.D.
6 min readAug 26, 2023

--

Data science/analysis code is often unseen

This is my pet peeve.

Too many data scientists write code on Jupyter Notebook which ends up never seeing the light of day, because your boss, upper management, and C-suites doesn’t know (or often times, don’t need to care) about how to access:

  • Jupyter server
  • Github
  • VPN to your latest cloud VM
  • Stuck behind someone’s Powerpoint/Excel, does not do your work justice

Let’s change that, to prevent analytics techniques to become next scientific publishing wall, shrouded in intellectual property and copyright risks. Open data techniques will protect everyone.

Prerequisites

  • Jupyter file(s) you want to publish
  • GitHub/Git experience
  • Basic command-line interface
  • Basic-to-intermediate understanding of Markdown

Optional

  • Knowledge of Sphinx (I personally never worked with this)

[Optional: Step 0] Create your repository and add your notebooks.

If you already have Github repository and Jupyter setup, skip to Step 1

I created one named jupyter-book-tutorialusing Github

[Optional: Step 0b] Use Github Codespaces for Editing

Using Github’s Codespaces will spare you of a lot of headaches in setting up python environments. Github is owned by Microsoft, and it basically launches a web-ready VS Code editor.

To setup a a free coding environment directly using Codespaces:

Click + button on top right → "New codespace" → Select your jupyter-book-tutorial repository. And voila — you have a default linux environment that includes some basic Javascript, Python, Docker.

Step 1: Install Poetry

I was contemplating making this step optional, but I want to encourage data scientists get better at python package management.

Poetry is a python package manager that’ll make future dependency management a lot easier, so it’s good to get used to using it (pip / requirements.txt route will cause problems when you start working with larger teams). The following command will install poetry for the global python environment for this Codespaces instance.

pip install -U poetry from the Codespaces terminal

In Terminal:

pip install -U poetry

Output:

Step 2: Initialize the repository with poetry

In Terminal:

poetry init

It’ll ask a few questions to setup the package. Default is to press Enter.

When it asks to define your main dependencies → Type in “no

Output:

What you just did was to create a way for the repository to keep track of what packages are needed, like requirement.txt that often exists with pip . But python package management can be dealt with other time.

Step 3: Install necessary python packages

My Jupyter notebook examples will use:

  • pandas
  • polars
  • altair
  • plotly
  • vega_datasets (sample datasets for Altair)

If you are used to pip install for poetry, you need to change that to poetry add

In Terminal:

poetry add jupyter-book ipykernel pandas polars altair plotly ghp-import vega_datasets

Output:

Once it runs, your window should look like this. Poetry will be keeping track of what’s installed in a file called pyproject.toml (generally don’t need to touch this, that’s what poetry is working with and managing)

Step 4: Setup the basic template for jupyter-book

jupyter-book (https://jupyterbook.org/en/stable/intro.htm) is a Sphinx based “document system” for basically parsing your Markdown files, Jupyter notebooks. Think of it as a file translator for the files to more openly sharable HTML, PDF.

In Terminal:

Make sure you are in your repository directory
Mine is /workspaces/jupyter-book-tutorial/

I personally have a habit of running poetry run rather than opening up a shell within the environment. (So bear with me, or optionally run poetry shell to activate the environment if you want to not repeat that command.)

cd /workspaces/jupyter-book-tutorial/
poetry run jupyter-book create myfirstbook/

Output:

It’ll create a template scaffolding with examples. We will edit some of these later.

Step 5: Publish to GitHub Pages (using ghp-import)

Even though the template has been created, we’ll need jupyter-book to generate some of the HTML files we need for making it publishable.

This requires seemingly complex steps, but requires:

  1. Build the book to HTML
  2. Make the generated files change available on the main branch of your Github repo
  3. GitHub Pages works by loading web-ready HTML files on separate git branch called gh-pages . This step can be made easy by using ghp-import repository we installed earlier.

In Terminal:

poetry run jupyter-book build myfirstbook
git add -A
git commit -m "publish"
git push
poetry run ghp-import -n -p -f myfirstbook/_build/html

Step 6: Check your website!

This step may take ~2 minutes but once the git branch is all set up, try accessing your new website:

https://[username].github.io/[repository-name]

(e.g. mine is on https://plasmak11.github.io/jupyter-book-tutorial)

Output:

Step 7: Let’s create/add your Jupyter notebook

I have a notebook that is just a copy-pasta version of the Altair’s data chart: https://altair-viz.github.io/gallery/natural_disasters.html

My Example notebook is located in:

myfirstbook/altair-intro.ipynb

altair-intro.ipynb

All notebooks will need a “Markdown” cell with at least one heading (hashtag)

The following is my bear minimum Jupyter notebook:

# Altair Intro
import altair as alt
from vega_datasets import data

source = data.disasters.url

alt.Chart(source).transform_filter(
alt.datum.Entity != 'All natural disasters'
).mark_circle(
opacity=0.8,
stroke='black',
strokeWidth=1,
strokeOpacity=0.4
).encode(
x=alt.X('Year:T', title=None, scale=alt.Scale(domain=['1899','2018'])),
y=alt.Y(
'Entity:N',
sort=alt.EncodingSortField(field="Deaths", op="sum", order='descending'),
title=None
),
size=alt.Size('Deaths:Q',
scale=alt.Scale(range=[0, 2500]),
legend=alt.Legend(title='Deaths', clipHeight=30, format='s')
),
color=alt.Color('Entity:N', legend=None),
tooltip=[
"Entity:N",
alt.Tooltip("Year:T", format='%Y'),
alt.Tooltip("Deaths:Q", format='~s')
],
).properties(
width=450,
height=320,
title=alt.Title(
text="Global Deaths from Natural Disasters (1900-2017)",
subtitle="The size of the bubble represents the total death count per year, by type of disaster",
anchor='start'
)
).configure_axisY(
domain=False,
ticks=False,
offset=10
).configure_axisX(
grid=False,
).configure_view(
stroke=None
)

To ensure you can work in Codespaces is set up with Python + Jupyter, “Select Kernel” button and make sure to choose “Python Environment” that is NOT in ~/.python/current/bin/python3 but partially corresponds to the repository name.

Output

Step 8: Add your new Notebook to Table of contents

You’ll find a filename _toc.yml

Original _toc.yml

# Table of contents
# Learn more at https://jupyterbook.org/customize/toc.html

format: jb-book
root: intro
chapters:
- file: markdown
- file: notebooks
- file: markdown-notebooks

Let’s get rid of all the entries on the bottom, and add your new file, altair-intro.ipynb

New _tol.yml

# Table of contents
# Learn more at https://jupyterbook.org/customize/toc.html

format: jb-book
root: intro
chapters:
- file: altair-intro.ipynb

Step 9: Final Step: Publish again!

In terminal, let’s run our lazy script again:

poetry run jupyter-book build myfirstbook
git add -A
git commit -m "publish"
git push
poetry run ghp-import -n -p -f myfirstbook/_build/html

[Optional, 1 liner for easier copy and paste]:

poetry run jupyter-book build myfirstbook &&  git add -A && git commit -m "publish" && git push && poetry run ghp-import -n -p -f myfirstbook/_build/html

Final words

There are a ton of customizations, themes for jupyter-book and Sphinx.

  • Organizing the Table of Contents ( _toc.yml)
  • Configurations (themes, add-ons) for Sphinx engine
  • And the modern improvements of Markdown “MyST” (Markedly Structured Text) which Jupyter-book uses.

I am planning to write more on Jupyter-books and Altair in the future, so give me a follow if you are interested.

--

--

Jung Hoon Son, M.D.

M.D. / Pathologist / Informaticist. Writing about general and biomedical data content, portable, shareable data visualizations (Altair/Vega))