Published in


Better notebooks through CI: automatically testing documentation for graph machine learning

Photo by Martin Adams on Unsplash

How it works

StellarGraph’s CI for notebooks works in three main ways:

  • Validates that every notebook is up to date and runs successfully, using papermill to keep CI fast for a good developer experience
  • Provides links to nbviewer to view failures conveniently in a nicely rendered way, directly in the browser
  • Checks that we’re providing a consistent experience by ensuring every notebook has links to cloud services and the same formatting, via black and custom code.

High velocity; reliable; humans — pick two

Documentation and examples are most useful when they’re accurate. StellarGraph is an active project with constant improvements and bug fixes; many of these allow us to make our notebooks better, and some changes require adjusting a notebook or two. All of this has to be done perfectly.

Software rot: With 25 pull requests a week and even a low chance of each pull request breaking a notebook (1%), the probability of all notebooks working quickly decreases to zero.

Automatically-run notebooks

To help us humans, we utilise computers to check the notebooks too: CI checks that most of them run properly on every pull request and every merge, in addition to the unit tests. But CI works best if it is fast, and, as most of our notebooks demonstrate heavyweight machine learning, it’s not fast enough.

Highlighting errors with Buildkite artifacts and annotations

CI helps us see when something has gone wrong, but it’s even better to know the exact problem. Papermill is helpful by logging all of the stdout and stderr output of the notebook; but digging through verbose CI logs is cumbersome. Furthermore, these logs often don’t have enough context to easily understand the problem especially compared to the rendered view of a notebook, where the error is attached to an individual cell, along with any other relevant cells.

When a notebook fails on CI, a link to view the failed notebook on nbviewer is automatically added to the build.

A consistent demo experience

StellarGraph works to provide a consistent experience for each demo because this makes it easier for:

  • users of StellarGraph to switch between demos without having to understand the idiosyncrasies of each one
  • developers of StellarGraph to write and edit the notebooks, because more parts of the process have an expected behaviour, and there’s no time-wasting on things like formatting
  • automatic tools to help us maintain demos, because the input notebooks are more structured.
The Binder button in a notebook, like the one in the introductory node classification demo using GCN, takes readers directly to an executable environment in less than one minute.
When notebooks aren’t formatted correctly, the CI build is annotated with the list of those notebooks and the command to run to fix it.


StellarGraph prides itself on its demos and we rely on automation to keep them great, even in a high-velocity project as we build up to our 1.0 release. We get a nice review experience via ReviewNB, and get alerted when a notebook fails to run or has inappropriate formatting. All of this happens before landing any changes, and problems are flagged with convenient links and suggested fixes using Buildkite CI.



We are a team of passionate engineers, designers, data scientists and researchers at CSIRO’s Data61, building cutting edge graph machine learning technology.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store