How to enable Jupyter Notebook collaboration on GitHub

Photo by Yancy Min on Unsplash

Most people know Jupyter notebooks as a scratchpad for data scientists. Jupyter’s ease of use accompanied with its rich ecosystem (visualizations libraries, magics, widgets, extensions) makes it an indispensable weapon in the data scientist’s armory. A growing number of data science groups are using notebooks as a collaboration medium within their teams & other stakeholders. In this article, we’re going to look at how Jupyter fares at collaboration & how you can use GitHub & ReviewNB to carve out a workflow.

Jupyter as a Communication Medium?

A typical data analysis consists of some input data, steps to cleanup & process the data, description of what’s…


Version Control is one of the major challenges with Jupyter Notebooks. We can use git to version control notebooks but it’s hard to review notebook diffs i.e. see what changed from one notebook version to another. The issue stems from the fact that Jupyter uses JSON underneath & stores rich media (HTML, images) in the JSON itself. This kind of hybrid format is not well supported in Git. Hence git diffs for Jupyter Notebook are pretty hard to review & resolving merge conflicts is a source of pain. …


Image used under license from Shutterstock.com

Jupyter notebooks are fantastic in many ways but collaboration is not so easy with them. In this article we’ll look at all the tools you can leverage to make notebooks play nicely with modern version control systems like git!

Why is Jupyter version control so hard?

The software world has converged on git as it’s version control tool of choice. Git is designed to work primarily for human-readable text files. Whereas Jupyter is a rich JSON document with source code, markdown, HTML, images all rolled into a single .ipynb file.

Git doesn’t handle rich documents like notebooks very well. E.g. git merge for long nested JSON documents…


A simple way for teams to review Jupyter notebooks on GitHub

At ReviewNB, we already support Jupyter notebook visual diffs & reviews for GitHub commits/ pull requests.

Today we’re releasing JDoc, a simple way for teams to review notebooks on GitHub. You can open a notebook in your repository & start discussion under any notebook cell. Teammates watching this repository or participating in the conversation will be notified (via email) so they can chime-in and move the conversation forward. This workflow would be very useful for Data Science / ML teams to review each other’s work, ask clarifying questions & provide feedback directly on the notebook cell.

Why the new feature?

We offer notebook review…


[Image used with open license from undraw.io]

There’s no easy way to version control notebooks from Jupyter UI. Of course you can drop down to command line & learn a bunch of git commands to version control your notebooks. But not everyone using Jupyter is proficient at git. Hence I built GitPlus, a JupyterLab extension that provides the ability to commit notebooks & create GitHub pull requests directly from JupyterLab UI.

How to version control Jupyter Notebooks

When GitPlus extension is installed, it provides a new menu item Git-Plus in JupyterLab UI. From there, you can commit notebook files or create a GitHub pull request as shown in demo videos below.

Create GitHub…


This is a Git-101 for Jupyter users that are not familiar with Git / GitHub. It’s a hands on tutorial & is meant to be comprehensive. Feel free to skip a section if you are already familiar with the steps. At the end you’ll be able to,

  • Push your notebooks to a GitHub repository
  • Start versioning your notebooks + learn how to revert to a specific notebook version
  • Get feedback & discuss notebook changes with your peers
  • Easily share your notebooks for others to view

Create GitHub Account

If you don’t have a GitHub account please create one here.

Setup Git Locally


This is a short post for ReviewNB users describing how to navigate quickly from GitHub to a relevant page in ReviewNB. If you are unfamiliar with ReviewNB, you might want to skip this and learn more about us on our homepage first. Let’s dive in.

The URL structure for ReviewNB is intentionally kept the same as that of GitHub. E.g. https://github.com/tensorflow/docs takes you to tensorflow docs repository on GitHub. If we just replace the github.com with app.reviewnb.com then we’ll land on same repository on ReviewNB. This URL structure identicalness holds true for other GitHub pages (PR, commits etc.) as well.


If you are not familiar with ReviewNB, you might want to check that out first. It’s a tool that lets your team review Jupyter Notebook changes & enables collaborative workflows with it. Today, I’m excited to announce commenting feature for all the ReviewNB users. Here’s everything you can do with it,

Write comments next to code/markdown cell

You can select any Notebook cell and write a comment for it. It could be a clarifying question, suggestion or just a simple comment.

Reply to a conversation thread

Notebook author and anyone else on the team can chime into the conversation. …


We’re happy to announce ReviewNB, a tool to help you version control & code review Jupyter Notebooks.

Problem

Jupyter is great for data exploration but it’s hard to go beyond that & do collaborative work with it. Following challenges exist in using Jupyter Notebooks with modern version control system like Git,

  • Notebook diffs are hard to read. Hence we can’t do code reviews on GitHub
  • Merging in remote changes is hard due to JSON format of Notebook files (.ipynb)
  • No easy way to share feedback & have a discussion around Notebooks
  • It’s not easy to reproduce Notebook results
  • It’s not easy…


Edit: A few months after writing this blog, I released ReviewNB, a tool for Jupyter Notebook code reviews. It addresses some of the concerns raised in this article.

A lot of people, including me, love Jupyter Notebooks.

It’s a fantastic tool for data science. Today, though, I’m not going to talk about it’s amazing capabilities, but rather how it fails at two important things: Version Control and Reproducibility.

I will also outline the current state-of-the-art tools to solve these problems. It’s a useful read if you are a Jupyter user. Let’s jump right in.

Version control

Jupyter Notebook renders nicely in the…

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store