If I ask you to rate your satisfaction about the usability and performance of Jupyter Notebooks from 1 to 10, what will be your answer?
If your answer is below 5, it’s time to go through this article.
Let’s dig into the Jupyter Notebooks first.
Jupyter Notebook is a web application which extends the console-based approach to interactive code editing. It is mainly used by Data Scientists for data cleaning and transformation, numerical simulation, statistical modelling, data visualization and various other machine learning tasks.
The Jupyter Notebook is popular as a tool for data exploration and prototyping.
- It is great for showcasing the work with both the code and outputs at one place.
- We can use separate cells for code segments and execute them separately.
- It facilitates inline documentation.
However, it will be still hard to go beyond prototyping and to do collaborative work with the Notebooks if you are still in the stone era of Jupyter Notebook.
If you think that the notebooks are terrifying, you might be using plain Jupyter Notebooks (Notebook just as how it is installed) or you might not know the tips of increasing efficiency that I am going to discuss at the end.
Come on! Let’s come out of the mess together.
Common Issues with Plain Jupyter Notebooks
Does not Support Version Controlling
The version controlling systems such as GitHub does not support Jupyter Notebooks as GitHub’s built-in source code tools are not designed for Jupyter notebooks. Some of the difficulties of using notebooks in GitHub is listed below.
- diffing and reviewing — Jupyter Notebooks are stored as big JSON files in GitHub. Therefore, reading the code diffs (the differences in source code before and after a change, are commonly referred to as diffs) is very hard. Hence, doing code reviews on GitHub has become very difficult.
- merging — It is impossible to merge two notebooks due to the JSON format.
- collaboration — No convenient way to share feedback and maintain a discussion on development work. If you need to collaborate with your team, using Jupyter Notebook for your project can be the worst idea.
- editing — Editing an .ipyb file directly on GitHub is impossible. The best and only way for it is cloning the notebook locally and editing it.
No Code Quality Enhancing Features
Maintaining high quality code is a dream of every data scientist and software developer.
To improve code quality in python code, we normally use styling guides such as PEP conventions and we use linters to enforce the guidelines and detect the defects and the other problems in our code. Most of the code editors and IDEs now can run linters in the background parallelly as we type. This helps to catch code defects such as mistyped variable names, missed brackets, incorrect tabbing and wrong number of arguments being passed to a function.
Unfortunately, plain Jupyter Notebooks do not facilitate linters. However, there are some notebook extensions we can use for code quality enhancing, and we will discuss them later in this article.
Hard to Test
Jupyter notebooks are meant for prototyping. Although it is highly recommended for code experiments and exploration, if you are a fan of test-driven development, Jupyter Notebook is not your place!
The importance of a test-driven environment is the ability to create unit tests and integrated tests to test each code segment, before executing the whole program.
However, it is very hard to develop test scenarios in Jupyter Notebook.
Many developers use the print() statement to test the outputs due to the hardness of writing test scenarios. However, prints cannot be taken as a professional or standard method of testing. In addition, it is not efficient. The users of Jupyter Notebooks know how much time is wasted by this method. Sometimes we find bugs at the very end of the code, and to catch it, we had waited till the whole code to execute.
Tips to Utilize the Jupyter Notebooks
Let’s see how we can take the maximum benefit of Jupyter Notebooks while mitigating the above discussed issues.
First, we will see which supporting apps and tools we can use with Jupyter Notebook.
1. Notebook Extensions
You can use Notebook Extensions to increase the performance of the notebooks. A popular extension set is jupyter_contrib_nbextensions which can be installed from GitHub.
Some useful and popular extensions available in jupyter_contrib_nbextensions are listed below.
- Table of Contents — collects all the headers and provide references to each section of the notebook. This helps to navigate through the notebook easily.
- Autopep8 — automatically formats the code according to PEP 8 guidelines.
Note: Autopep8 overcomes one of the above discussed issues; Jupyyter Notebook not having code quality enhancement features.
- Snippets — provides sample codes to load common libraries and create sample plots.
- Hinterland — provides code auto completion suggestions.
- Scratchpad — creates a temporary cell to do quick calculations without creating a new cell in your workbook.
- Code Folding — helps to hide the code blocks when we do not need to read them.
2. Code Review Tools
As discussed earlier, it is not an easy task to review, diff (identify the differences in source code before and after a change) and merge the Jupyter Notebooks in GitHub.
Cheer Up, there are some tools and apps to save you!
ndbime provides tools for diffing and merging of Jupyter Notebooks.
- nbdiff compare notebooks in a terminal-friendly way.
- nbmerge three-way merge of notebooks with automatic conflict resolution
- nbdiff-web provides rich rendered diff of notebooks.
- nbmerge-web provides a web-based three-way merge tool for notebooks.
- nbshow present a single notebook in a terminal-friendly way.
To compute the code diff, we need to clone the repository, download, and install nbdime. However, ndbime is not integrated with GitHub pull requests.
Jupydiff is a GitHub Action that facilitates comparing changes made to Jupyter notebooks in GitHub repositories. Jupydiff is based on nbdime and builds on the official Python Docker image.
Unlike ndbime, this interacts with regular commits and pull requests. It is run on the repository for each pushed commit or open PR. It compares changes made with the latest commit/PR and comment those differences on it. Jupydiff works with both private and public GitHub repositories.
No more JSON mess!
ReviewNB is a GitHub App available on the GitHub marketplace. It communicates with GitHub APIs to fetch the notebook changes made in commits or PRs.
A noteworthy feature in ReviewNB is that it creates visual diffs for notebooks. It brings out the code diffs in the notebook in a side-by-side diff format. This visual diff is much more readable than the messy JSON diff.
In addition, you can comment on the GitHub commits or PRs to give feedbacks for your teammate’s work or to seek for more clarification. A notification is sent to the teammate with your comment.
ReviewNB resolves most of the problems with diffing, reviewing, and maintaining collaboration.
JupyterLab is a newer product developed by Jupyter, compared to Jupyter Notebook. It is a web-based interactive development environment which supports Jupyter Notebooks, various file types, test editors and terminals. In brief, JupyterLab incorporates Jupyter Notebook into an Integrated Development type Editor that you run in your browser.
We can consider JupyterLab as an advanced version of Jupyter Notebook. You can get a wonderful experience of a combination of an editor and a notebook in JupyterLab.
Now let’s come to the second part of our tips. These are the special tip we must know when using the Jupyter Notebooks.
1. New Variables
We can create variables, assign them values and re-assign new values anywhere in the code. However, instead of re-writing on the same variable, using new variables when possible will sometimes make your work easier.
For instance, assume you load a huge .xlsx file or a .csv file into the notebook and store it in a variable. Then you need to filter out some columns from it and do some modifications in few steps. You can store the modified file in the same variable as this.
However, assume your modifications go wrong and you do not get the expected outcome at the end. To debug your code, you will have to run the notebook multiple times. Then it is a crisis to wait till it loads your giant file, every time you run it.
As a solution, you can maintain two variables, on to store the loaded file and the other to store the filtered file. You can put the loading part into one cell and the modifying part into another cell, so you only need to execute the second cell when debugging.
Although the first method provides a more efficient memory usage, every time you need to reset the data, you must load the original file. Therefore, give the second method a try when possible.
2. Keyboard Shortcuts
We know that the keyboard shortcuts generally play a major role in increasing work efficiency in any platform. The more shortcuts you know, the faster you will complete your work.
You can see the list of keyboard shortcuts in Jupyter Notebooks in
Help -> Keyboard Shortcuts menu.
You can edit the shortcuts from
Help -> Edit Keyboard Shortcuts menu.
3. Export Notebooks in Different Formats
Notebooks are generally the playground of the data scientists. However, sometimes they have to send their notebooks to people of other fields, especially non-technical.
If the receiver has no intention of running the code and testing it, sending the code in the notebook will not be an additional burden to him, because the receiver will have to install Jupyter Notebook, just to read the code.
Jupyter has a marvelous feature to convert the notebooks to different formats. according to the receivers need, we can decide the format we send the code.
We can use nbconvert to convert and export the notebooks.
The available formats are listed below.
- Executable script (.py)
- ReStructured Text
Jupyter Notebook is always a good tool to try out a new piece of software to verify yourself that it suits the need and worthy to use. If you were disappointed about Jupyter Notebooks, I invite you to give it another try with these new tips you learnt!