The Notebooks era — How notebooks are changing the way we develop code
How notebooks are driving the new coding era away from Text editors and IDEs
The age of notebooks
With Machine Learning becoming more mainstream, there is keen interest for notebooks. Beside the open sourced, Jupyter and Zeppelin notebooks, cloud providers have released their own version, Amazon has released EMR Notebooks, Google Colab and Microsoft Azure notebooks. More traditional software vendors such as Oracle have jumped into the fray, and so have some startups such as dataiku
JupyterHub is a multi users distribution of Jupyter, allowing user authentication and authorization and making it possible to host notebooks on a server rather than a local machine.
JupyterLab is an evolution of Jupyter Notebook that is moving towards an IDE like environment. Using JupyterLab it is possible to edit multiple files at once, have a view on the different files within a folder…
An extensible ecosystem
Alongside this flurry of solution, also come with enhanced support and extension to some of these notebooks. Jupyter Notebooks are now supported as part of VS Code, extensions such as Papermill allow for the parametrization of notebooks, scrapbook a library for managing Jupyter’s notebook outputs, reviewNB a code revision systems dedicated to notebooks or the binder project which aims the setup your notebooks within reproducible environments.
These extension are not counting the numerous widgets included as part of other libraries that make use of Notebooks’ rich output and interactiveness.
Notebooks have also benefited from quite a few success stories from companies such as Netflix, which have written extensively about their use of Notebooks and how they are used for production use cases.
Pros and cons to Leveraging Notebooks
Notebooks make for quick prototype and rich experiences
Notebooks bring the interactiveness of a REPL with an UI for added flexibility and better code editing. This makes for quick feedback during development and allow to develop code at a faster speed than if it had to be run as one monolithic piece of code.
The way Notebook setup code in blocks, which can be run independently, makes it so that the code is not necessarily executed in a linear fashion, this scan make it easier to experiment and trial certain operations without having to re-run the full workflow
Notebooks also provide a great to share code in context, including specific annotations that make the code and analysis more easily interpretable, or by providing rich outputs, which makes it particularly well suited for analysis tasks that requires both numerical and visual outputs. Extra widgets exists to allow to further deep dive quickly into certain areas of the data
This richness can be further enriched by widget components such as an entry form widgets, that makes it possible to create simple applications.
However they can raise issues when attempting to set it up for productions
There can be a number of issues that can arise when trying to leverage notebooks for production use rather than an analytical use case.
Interactiveness of notebooks makes for quick prototype, often without the care that is being put into a production code. Code that is made for quick prototyping is often not setup using the same abstraction that would be used in a normal development flow.
It is also easy to get tangled into issues by running code out of sequences and having variables still defined within the kernel. Developing, using code out of sequence, can yield to erroneous or non-reproducible results. There needs to be the care and diligence of re-running the notebook once the flow has been fully setup.
Furthermore the notebook environment isn’t well suited for dealing with larger code base, full notebook usually needing to be re-run based on changes in external files rather than having only the cell dependents on the changes update, need to have to use an editor and the notebook.
Notebooks also don’t have the same level support for code review, that you can find with normal .py files, nor an integrated way to manage the code versioning workflow within the code editor or notebook application.
Notebooks benefits from a rich and diverse ecosystem
Jupyter notebook and to a lesser extent Zeppelin benefit from a varied ecosystem of extensions both official and unofficial. These extensions enhance certain functionalities, making the notebook more interactive or making it easier to push notebooks to production. Beside these extensions, there are also a number of components and tools that extend Notebooks’ functionality.
NbExtensions provides a collection of unofficial extensions for use with Jupyter Notebook. Some of the extensions .provided, allow the use of Latex Cell, push to github gist, automatic code formatting …
NbExtensions offers a configuration UI, that let you easily enable or disable specific extensions. Some of the features of the extensions contained are described in this article.
Jupyter/iPython have support for “magic commands”, which extend the range of functionality of the notebook beyond that of the interpreter. There are built in magic commands such as
%%timeit which will output the time it takes to execute a particular cell.
Jupyter notebook offers for the creation of custom magic commands beyond those provided by default. Cython for instance offers a magic command
%%cython -a to trace calls made in Python vs. C.
Papermill is an extension to Jupyter that allows to parametrize notebooks, allowing the code within the notebook to be re-run with different parameters in a systematic way. This setup makes it possible to push the notebook code to production without having to export all of it to python file, and allows to be able to keep the richness of output of the notebooks.
There are a couple of ways to for Papermill to interact with notebooks, either through Python code, or using its’ CLI. There is also a specific Papermill operator within Airflow exists allowing to break some of the gap to set up notebooks in production.
papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks. Papermill lets you: This opens up…
Binder is a tool, leveraging docker, that allows to create reproducible environments for notebooks. It supports Jupyter notebook, Jupyterlab as well as a few other notebooks or notebook-like interfaces.
The Binder Project
Enter your repository information Provide in the above form a URL or a GitHub repository that contains Jupyter…
Widgets provide a way to add interactive functionalities to notebooks, there is a set of default widgets provided by the ipywidgets library, some python library such as Bokeh integrate some of these functionalities.
These widget let you create, small interactive applications within your notebook, that can let you explore some particular aspect of a dataset without having to code for it.
Reviewnb is a paid application that allows to perform code reviews on Jupyter Notebooks. It integrate with Github and allows for the creation of conversation thread on individual notebook cells.
ReviewNB: Code Reviews for Jupyter Notebooks
Say goodbye to messy JSON diff. Visual diffs & commenting to review Jupyter Notebooks on GitHub.
nbgitpuller is an utility that lets distribute the content of a git repository without having to understand git itself. It provides an automatic merging behavior that retains the changes made locally.
nbgitpuller — nbgitpuller 0.1b documentation
nbgitpuller lets you distribute content in a git repository to your students by having them click a simple link…
Commuter provides a way to explore both local and remote directory and allows Jupyter to read the content of these notebooks.
kəˈmyo͞odər/ a person who travels some distance to work on a regular basis. As commuters, we rush around from place to…
Airbnb’s Knowledge repo is a tool that is aimed to facilitate the sharing notebooks and information within an organization. It let you browse and read a series of notebooks that have been published within an organization, search for keywords or filter by tags.
The Knowledge Repo project is focused on facilitating the sharing of knowledge between data scientists and other…
Nbconvert is a utility that allows to convert a notebook to a different file format, such as a PDF. Nbconvert has for instance, been used alongside with papermill for report generation.
The nbconvert tool, jupyter nbconvert, converts notebooks to various other formats via Jinja templates. The nbconvert…
Notebooks are very powerful tool for analysis and prototyping, they benefit from large ecosystem of plugins and tools that enhance their functionality. For certain types of operations such as working on larger codebase, they are not perfectly suited, but we are starting to see some evolutions and tools that palliate some of these issues.