Koopera: Collaboration app for sharing and reviewing Jupyter Notebooks

Ricardo Neves
5 min readJan 8, 2020

Introduction

Jupyter Notebook has become the “de facto” technology for Data Science teams to collaborate and share their experimentation findings. Still, the Data Science development ecosystem has not evolved enough to provide proper tools for teams to work efficiently.

Despite data scientist’s responsibilities and skill set differing from those of the traditional software developer, there is an overlap of common daily tasks, which led to most companies applying modern Software Engineering principles to Data Science teams. Hence, it is common to find data scientists working with agile methodologies and using a Git hosting service as the collaboration tool for submitting, tracking and reviewing their work.

Nowadays, there are a plethora of Git services available, e.g., GitHub, GitLab, BitBucket. Although these services have been around for a while, they were built with software developers in mind, not data scientists. Thus, despite the extensive array of features that these services provide, there is still a lack of support for the data scientist’s basic unit of work, i.e., the Jupyter Notebook.

As a consequence, organizations tried to come up with their own workarounds to overcome two main issues:

  1. lack of support for adding comments in notebooks cells at the code review stage;
  2. lack of support for sharing/publishing notebooks for democratizing research.

With this in mind, I have developed and open-sourced Koopera, a collaboration app that enables Data Science teams to share and review their Jupyter Notebooks.

At its core, Koopera has two goals:

  1. Enable data scientists to effectively discuss/review their work;
  2. Improve notebook sharing and discovering.

In the remaining of the post, I’ll showcase Koopera’s current features, highlight other work in the same problem space, and summarize future plans for Koopera.

Features

Currently, Koopera has the following set of features:

  • Review Jupyter Notebooks by commenting directly on its cells
  • Review source code with syntax highlight for python, scala, etc.
  • Import notebooks from your GitHub repositories
  • Navigate through your notebooks
  • GitHub authentication via a personal access token

Review Jupyter Notebooks

Koopera has access to all your GitHub code repositories and opened pull requests. A pull request must be opened on GitHub in order to be able to review a notebook in Koopera.

Selecting a pull request to review

You can add comments directly in the notebook cells and reply to previous ones through the pull request view.

Adding comments in notebook cells

All the comments that are done in Koopera will appear in the GitHub pull request and vice-versa.

Notebook comments performed via Koopera displayed in GitHub

Review code: python, scala, etc.

It is also possible to review source code files in Koopera with syntax highlighting support for popular languages like python and scala.

Reviewing python code

Just like notebook comments, these are also displayed in GitHub with the difference that these are shown in the exact code line.

Import and navigate through your notebooks

Through Koopera it is possible to import all the existing committed notebooks from your GitHub repositories. It is up to the user to specify which code repositories should be imported. Once the notebooks are imported, it is possible to view them when clicking in the navbar’s “Notebooks” tab. It is important to note that once you select and import a GitHub code repository, Koopera will automatically try to look for new, deleted and updated notebooks in it and update the state accordingly, so it always has the notebooks up-to-date with the target repository.

Importing notebooks from a GitHub repository and opening a notebook

Honorable mentions

Before presenting my thoughts on Koopera’s future, I would like to highlight two projects that are also tackling the aforementioned issues: ReviewNB and Airbnb’s Knowledge Repository.

ReviewNB is a GitHub app built to tackle the problem of reviewing notebooks at the code review stage with support for notebook diffs. It is free for open-source repositories and has two pricing alternatives for private repositories. It has a 14-day free trial and is trusted by 200+ organizations.

Knowledge Repo is Airbnb’s attempt to build a platform to ease knowledge sharing between data scientists and other technical roles. It is an open-source project and my main source of inspiration for adding the feature that allows sharing and exploring notebooks into Koopera. I highly recommend reading Airbnb’s Scaling Knowledge at Airbnb post to get a better grasp of the kind of problems that they aim to solve with this platform.

Future plans

Looking into the near future, I think that adding the capability to merge and open pull requests through Koopera would be the main focus, so one does not need to switch back and forth between GitHub and Koopera, and rely upon the former exclusively for hosting purposes. The existing functionality should also be updated soon for scalability and security reasons by adding basic features like API pagination, migrating the notebook import to a background job, and adding proper authentication since the current JWT contains the user's personal access token. There are also plenty of improvements to be done on the notebook exploration front such as adding new fields like author, summary, tags; plus adding filtering and sorting functionality. Extending the project to support other Git services like GitLab or Azure DevOps is also a potential short/mid-term goal.

I have already created some of these issues, feel free to check those out and contribute: https://github.com/rsn491/koopera/issues

Conclusions

Koopera is still in an early stage of development, hence, it only possesses a minimal set of features so one is able to review and share notebooks through it. The goal behind releasing it open-source was so that other people could leverage its current functionality, extend it, and eventually contribute to it.

It is needless to say that at any moment GitHub or other Git service providers might launch support for commenting and displaying notebook diffs. When that day comes it would be good news for the data science community.

Until then, Koopera can be an open-source alternative capable of providing that missing functionality as well as delivering notebook sharing capabilities within a single platform.

https://github.com/rsn491/koopera/

--

--