Powering Documentation with Jupyter Notebooks

We recently launched our new docs site for Imandra, which is backed by Jupyter notebooks!

This post is a quick rundown of why we decided to produce our documentation, the journey of getting there, and the various pieces involved.

(A quick plug: Imandra is our powerful new reasoning engine. You write your code and verification goals in the same language - to pursue programming and reasoning together. Take a look to find out more.)

Our new documentation!

What we had before

Previously, we were using Jekyll for our docs site, and that worked pretty well. In fact, we’re still using Jekyll for other documentation projects. However, as things evolved, we soon realised there was quite a bit of work involved in keeping our documentation up to date with changes to Imandra itself. Even simple things like changing how output was formatted meant we had to go through and update a lot of manually typed up session snippets — we wanted our docs to reflect as accurately as possible what you’d actually see when you ran a session yourself.

In the meantime, we’d written a Jupyter notebook kernel as a way of getting Imandra out into the open for people to try out with as little setup friction as possible. (This is deployed using the fantastic Zero to JupyterHub project, and the result is our Try Imandra website).

With this out in the open, we needed some content to form the basis of people’s Try Imandra sessions, so we started feverishly writing notebooks.

Stage 1 — Executing notebooks during build

The first issue we ran into was the examples in our notebook getting out of sync with updates to Imandra — the same problem we’d been having with our hand-written documentation.

We were saving .ipynb files to be deployed, and when you first opened one everything looked great. If a user came to execute the cells themselves though, the outputs saved in the notebook files didn’t necessarily match what the deployed version of the notebook kernel actually displayed, with differences ranging from minor formatting changes to errors in some cases.

We quickly came across the power of nbconvert — part of the Jupyter toolkit, which allows you to convert notebooks to different formats (HTML, slides, PDF etc.), and executes the cells during conversion. However, it also lets you convert the notebook .ipynb format to itself!

We can use this as a lightweight integration test of the notebooks and kernel to make sure our examples behaved as expected, and also to make sure the latest formatting changes were baked in:

for nb in notebooks/*.ipynb; do
    jupyter nbconvert --to notebook --execute --ExecutePreprocessor.timeout=60 --inplace “$nb” 
    if grep ‘\(Exception\|I/O error:\|Error: Unbound module\)’ “$nb”; then
exit 1
fi
done

The above snippet runs nbconvert, with the key flag --execute, so inputs are re-evaluated. There’s a per-cell timeout which causes a build failure if something has got stuck for too long, and we then grep the output file for a couple of key internal error phrases. This has caught quite a few issues as it touches a lot of code and integration points, while being very easy to maintain.

Stage 2 — Markdown content source

The next thing we ran into is apparently a common issue with Jupyter notebooks — version controlling them effectively.

The .ipynb format is just a big JSON document, which means it can be diffed using standard tools like git. It does however quickly lead to very noisy diffs as it contains the content of both the input and output cells. This is quite problematic when trying to track changes.

We looked at various options and decided notedown suited our needs best. It‘s similar to nbconvert’s markdown mode, but it clearly marks input cells as markdown code blocks and removes output cells entirely. You can run it via the CLI for one-off conversions, but it can also be installed as a notebook ContentManager. This allows you to commit .md files in your repo, but you can still fire up a notebook session, open the files and edit them as if they were ordinary notebooks. The changes can then be saved back to markdown when you’re done with a nice diff-able set of changes!

The .md files show up in the notebooks file browser, and when opening are converted to a live, editable notebook on the fly.

During our build, we then use notedown via the CLI to convert the .md to .ipynb, then use the same nbconvert step above to execute everything, leaving the .ipynb files in our build artifact to be shown to the user on Try Imandra.

We had another motive for choosing notedown: with .md source files, we were hoping we could use the content in other places, for example in our Jekyll docs project. However, having learnt more about the various tools, we realised there was another option.

Stage 3 — notebooks rendered to HTML

As we discovered earlier, nbconvert allows you to convert notebooks to HTML, but it also lets you pass a template via the --template flag, which gives you fine-grained control of the generated markup.

We pulled in the templates we’d been using in our Jekyll project, converted them to the jinja2 format used by nbconvert, and pulled out JS and CSS resources into separate files so they can shared between the different rendered notebooks — nbconvert’s default behaviour is to inline resources into the HTML output, which is very convenient for bundling purposes but leads to huge HTML files which can’t share resources with each other via the browser’s cache.

Jekyll (and also Github) use a YAML-formatted header to add metadata to your documents, and being able to store metadata like this alongside our source files was very appealing, for things like URL slugs and keywords. We applied a small patch to notedown (which we’re considering contributing upstream), which passes this header through a metadata cell unchanged during our editing process, and also makes it available in the notebook during HTML conversion withnbconvert.

We switched to using nbconvert as a library as part of a small build script, which reads this metadata during conversion and uses it for building our static directory structure (including slugs) from the generated HTML and the metadata. We also create an index of key phrases as seen on the index page from the metadata as part of the script.

The key phrases index on the imandra-docs index page.
The markdown file YAML metadata header

Finally, we deploy this static directory structure to Github Pages alongside the live Try Imandra version as part of each build. This allows us to provide a lightweight ‘read only’ viewing experience on the docs site, complemented by a much more interactive, but slightly slower loading (and more resource intensive on the backend) experience for those who want to dig in a bit deeper on the Try Imandra site. Best of all — as both are generated from the same source content and built together, everything is kept in sync automatically.

Overall, we’re huge fans of the Jupyter ecosystem as it enables us to do great stuff like this and provides a huge amount of leverage!