Most people know Jupyter notebooks as a scratchpad for data scientists. Jupyter’s ease of use accompanied with its rich ecosystem (visualizations libraries, magics, widgets, extensions) makes it an indispensable weapon in the data scientist’s armory. A growing number of data science groups are using notebooks as a collaboration medium within their teams & other stakeholders. In this article, we’re going to look at how Jupyter fares at collaboration & how you can use GitHub & ReviewNB to carve out a workflow.
A typical data analysis consists of some input data, steps to cleanup & process the data, description of what’s…
Version Control is one of the major challenges with Jupyter Notebooks. We can use git to version control notebooks but it’s hard to review notebook diffs i.e. see what changed from one notebook version to another. The issue stems from the fact that Jupyter uses JSON underneath & stores rich media (HTML, images) in the JSON itself. This kind of hybrid format is not well supported in Git. Hence git diffs for Jupyter Notebook are pretty hard to review & resolving merge conflicts is a source of pain. …
Jupyter notebooks are fantastic in many ways but collaboration is not so easy with them. In this article we’ll look at all the tools you can leverage to make notebooks play nicely with modern version control systems like git!
The software world has converged on git as it’s version control tool of choice. Git is designed to work primarily for human-readable text files. Whereas Jupyter is a rich JSON document with source code, markdown, HTML, images all rolled into a single .ipynb file.
Git doesn’t handle rich documents like notebooks very well. E.g. git merge for long nested JSON documents…
At ReviewNB, we already support Jupyter notebook visual diffs & reviews for GitHub commits/ pull requests.
Today we’re releasing JDoc, a simple way for teams to review notebooks on GitHub. You can open a notebook in your repository & start discussion under any notebook cell. Teammates watching this repository or participating in the conversation will be notified (via email) so they can chime-in and move the conversation forward. This workflow would be very useful for Data Science / ML teams to review each other’s work, ask clarifying questions & provide feedback directly on the notebook cell.
We offer notebook review…
There’s no easy way to version control notebooks from Jupyter UI. Of course you can drop down to command line & learn a bunch of git commands to version control your notebooks. But not everyone using Jupyter is proficient at git. Hence I built GitPlus, a JupyterLab extension that provides the ability to commit notebooks & create GitHub pull requests directly from JupyterLab UI.
When GitPlus extension is installed, it provides a new menu item Git-Plus
in JupyterLab UI. From there, you can commit notebook files or create a GitHub pull request as shown in demo videos below.
Create GitHub…
This is a Git-101 for Jupyter users that are not familiar with Git / GitHub. It’s a hands on tutorial & is meant to be comprehensive. Feel free to skip a section if you are already familiar with the steps. At the end you’ll be able to,
If you don’t have a GitHub account please create one here.
This is a short post for ReviewNB users describing how to navigate quickly from GitHub to a relevant page in ReviewNB. If you are unfamiliar with ReviewNB, you might want to skip this and learn more about us on our homepage first. Let’s dive in.
The URL structure for ReviewNB is intentionally kept the same as that of GitHub. E.g. https://github.com/tensorflow/docs takes you to tensorflow docs repository on GitHub. If we just replace the github.com with app.reviewnb.com then we’ll land on same repository on ReviewNB. This URL structure identicalness holds true for other GitHub pages (PR, commits etc.) as well.
…
If you are not familiar with ReviewNB, you might want to check that out first. It’s a tool that lets your team review Jupyter Notebook changes & enables collaborative workflows with it. Today, I’m excited to announce commenting feature for all the ReviewNB users. Here’s everything you can do with it,
You can select any Notebook cell and write a comment for it. It could be a clarifying question, suggestion or just a simple comment.
Notebook author and anyone else on the team can chime into the conversation. …
We’re happy to announce ReviewNB, a tool to help you version control & code review Jupyter Notebooks.
Jupyter is great for data exploration but it’s hard to go beyond that & do collaborative work with it. Following challenges exist in using Jupyter Notebooks with modern version control system like Git,
Edit: A few months after writing this blog, I released ReviewNB, a tool for Jupyter Notebook code reviews. It addresses some of the concerns raised in this article.
A lot of people, including me, love Jupyter Notebooks.
It’s a fantastic tool for data science. Today, though, I’m not going to talk about it’s amazing capabilities, but rather how it fails at two important things: Version Control and Reproducibility.
I will also outline the current state-of-the-art tools to solve these problems. It’s a useful read if you are a Jupyter user. Let’s jump right in.
Jupyter Notebook renders nicely in the…