Creating Pull Requests with Jupyter Notebooks

Creditas
Creditas Tech
Published in
3 min readDec 19, 2019

--

Article written by Bárbara Barbosa. The content was also released on Analytics Vidhya.

Have you ever wondered how you would give feedback to a person who forgot to label the x-axis of a chart?

Not familiar with Jupyter Notebook? Check out DataCamp’s tutorial

When I started to work as a back-end developer, it was the wild west. I could deploy without anyone looking at my code, and I would only know that something had gone wrong when a client called to let me know something had crashed. One day, through a conversation at an event, I discovered the existence of Pull Requests (PRs)! With them, other developers could see my code and give me various feedbacks, such as how I could improve my code and what the good practices were for their respective implementations. There are numerous advantages to working with Pull Requests, but I found that the most amazing one was the sharing of knowledge they enabled. For more details on creating Pull Requests using bitbucket, check out this link.

When I became a data scientist, everything changed. I started working with Jupyter Notebooks, which was wonderful for my research and the visualization of charts and the results of my code, yet it was terrible for making PRs. Github renders notebooks, but they can’t be commented on. In PR mode, what appears is a JSON that isn’t easy to read.

In order to solve this problem, we scoured some communities, especially that of a project I really admire: “Serenata de Amor” (Love Serenade). There, we found a relatively efficient method: generate a .py as well as a .ipynb. To do this automatically each time you save your notebook, simply add this code to the file: ~/.jupyter/jupyter_notebook_config.py:

# Based off of https://github.com/jupyter/notebook/blob/master/docs/source/extending/savehooks.rst

import io

import os

from notebook.utils import to_api_path

_script_exporter = None

def script_post_save(model, os_path, contents_manager, **kwargs):

“””convert notebooks to Python script after save with nbconvert

replaces `ipython notebook — script`

“””

from nbconvert.exporters.script import ScriptExporter

if model[‘type’] != ‘notebook’:

return

global _script_exporter

if _script_exporter is None:

_script_exporter = ScriptExporter(parent=contents_manager)

log = contents_manager.log

# save .py file

base, ext = os.path.splitext(os_path)

script, resources = _script_exporter.from_filename(os_path)

script_fname = base + resources.get(‘output_extension’, ‘.txt’)

log.info(“Saving script /%s”, to_api_path(script_fname, contents_manager.root_dir))

with io.open(script_fname, ‘w’, encoding=’utf-8') as f:

f.write(script)

c.FileContentsManager.post_save_hook = script_post_save

This automatically doubles the amount of files created! However, it allows you to see the notebook and comment on the .py file, in the cell that it makes sense for someone to modify it in.

It’s worth mentioning that there are other alternatives to making a Pull Request, especially if you are working with open source code. One of these alternatives is reviewNB, which allows for comments to be left directly in the notebook’s cells, but this solution is no longer free for private repositories and only works with Github (unfortunate for GitLab and BitBucket users). You can also perform tests with notebooks using nbviewer.

Mood when your PR is approved after 888 comments.

Another good practice we follow is our use of a slightly modified version of Cookiecutter Data Science to organize our projects. This way, we follow the rule that “Notebooks are for exploration and communication” — that way, data extraction code, feature engineering, and tuning models are kept somewhere else while the notebooks mainly serve for EDAs (Exploratory Data Analysis) and evaluations. This greatly facilitates the versioning and execution of these codes.

The structure of Cookiecutter’s folders makes it so that any person from the team can work and contribute to the research of all of Creditas’ data scientists.

Interested in working with us? We’re always looking for people passionate about technology to join our crew! You can check out our openings here.

--

--

Creditas
Creditas Tech

A Creditas é a principal plataforma online de crédito com garantia do Brasil. Nosso propósito é viabilizar as novas conquistas das pessoas.