Supporting Collaboration in Computational Notebooks with Cell Folding
This post summarizes a research paper on how cell-folding can make it easier to navigate, share, and reuse computational notebooks. The paper will be presented at the ACM Conference on Computer-Supported Cooperative Work and Social Computing on November 7th.
Computational notebooks are an increasingly popular means of tracking and sharing data analyses as they combine analysis code, visualizations, and text (which used to span multiple script, result, and commentary files) into a single interactive document. Computational notebooks are first and foremost an interactive computing environment enabling analysts to iteratively write and execute code. But analysts can also use notebooks to craft computational narratives that present complex analyses as richly-annotated data stories.
There has been a growing excitement about how the mix of code and narrative text in computational notebooks might support collaboration and reproducible research. One article in The Atlantic even declared the scientific paper “obsolete”, soon to be replaced by scientists sharing their discoveries via computational notebooks. However, several studies have shown that, by the end of iterative analyses, most notebooks are long and messy collections of notes and scripts that are difficult to navigate or understand. Rather than share their results via these messy notebooks, many analysts copy key figures to other media such as slides and word processing documents before sharing them, limiting their collaborator’s ability to reproduce the result. While computational notebooks help analysts perform iterative analyses, they tend to be seen a personal document and have yet to be widely used for collaboration.
This recurring finding got us thinking: How might we design computational notebooks so analysts are more comfortable sharing them with collaborators and collaborators can navigate and use them more easily?
Making Long and Messy Notebooks Easier to Navigate, Share, and Reuse with Cell-Folding
One approach that we explored in this paper is to encourage lightweight organization and annotation that makes long and messy notebooks easier to navigate. Based on prior work and design workshops with notebook users, we noticed that many analysts said their notebooks quickly got too long and messy to navigate easily. That got us thinking: What if we applied the concept of code-folding — which has long been used in other development environments to aid navigation of long files — to notebooks? Might cell-folding benefit not only the original analyst, but later collaborators as well?
We designed an extension for Jupyter Notebook that enables analysts to fold cells (i.e., blocks of code that can be executed independently) in named sections that are initially hidden, but can be revealed in a sidebar. Based on prior work on code-folding we expected cell-folding to make it easier to navigate long notebooks. But would it also make it easier to understand someone else’s notebook? And might it encourage analysts to clean and share their entire notebook instead of extracting key results to share via other media?
Testing Cell-Folding with Data Science Students
To answer the first question we ran a lab study with 32 undergraduate students in a data science class that used Jupyter Notebooks. We gave each student a starter notebook where a “colleague” had begun to analyze housing data from 5 US cities. We then asked them to complete three tasks that would extend the analysis to include a 6th city. Each task could be completed by tweaking and reusing code already in the notebook. Half the students received a standard Jupyter Notebook without cell-folding and half received a notebook where every few cells had already been folded into a section with names like “Import Data” and“Plot Home Prices Over Time”. The notebook with folded cells was initially about 1/5th the length of the expanded notebook, and each section had to be un-folded to reveal the underlying code.
While there was no difference in participant’s self rated ease of navigating or understanding each notebook, participants in the cell-folding condition tended to complete their tasks more quickly, especially if they were novice programmers, though this effect was weak and variable.
In interviews, participants’ suggested that cell folding was a welcome addition to the notebook. As one participant in the cell-folding condition said:
I like the hide cells thing. I never really explored that. If that’s not already a thing in Jupyter, I hope it’s a thing… When I’m trying to find a certain portion of the notebook it would be easier to just hide the portions I don’t need currently
Meanwhile those in the standard notebook condition wished they could compress the notebook in some way to make it easier to navigate:
I wish there was some type of compressing tool. I did not have to see all these plots to be able to understand the next step. So if there is anything that hides the plots and expands it only when I want to see it. I feel tools like that would make it easier to navigate.
Testing Cell-Folding with Experienced Analysts
While the first study suggested that cell-folding could make it easier to navigate and reuse code in a collaborator’s notebook, we also wanted to see how analysts might use cell-folding outside the lab. In a second study, we asked three experienced academic data analysts to try our cell-folding extension for a few weeks as they conducted analyses in Jupyter Notebooks.
Our participants reported that in addition to making it easier to navigate their own notebooks while analyzing data, they used cell-folding to make it easier to present results directly from their notebooks at weekly lab meetings. One participant mentioned that before having the cell-folding feature:
When I’ve presented versions of this notebook in our weekly meetings, I’m always scrolling through and it takes a while to scroll through something and I might think, “Oh I want to go back to this plot above”, and I scroll scroll, scroll, scroll and people are getting distracted by various plots.
Its all slide decks now. We’ve tried… I haven’t had much success using notebooks as a presentation tool. I’ve just kind of given up on that.
However using cell-folding, analysts could quickly tailor their notebooks for group presentations:
I can just kind of quickly scroll through [the notebook] and know that every cell that is still left is a cell that I wanted to show for some reason
One participant also liked being able to selectively hide results until he and his collaborators had discussed what they expected the result to show:
I feel like when you have these notebooks with all these figures its really tempting to just scroll down until the next figure and just look at it and scroll down… its a nice experience [with folded outputs] for me or a collaborator to say, “Okay, what is it that I’m looking for, what do I expect to see?”. [And then unfold them]
By aiding notebook navigation, cell-folding seems to support not only the original analysis but also later reuse of notebooks both for group presentations and having a collaborator extend the analysis. However, there are tradeoffs involved with cell-folding as it can encourage collaborators to overlook folded cells and dwell on exploring the names of folded sections. Future research should explore how computational notebooks and other forms of computational media can support richer forms of navigation and manipulation to help analysts and their collaborators deeply engage with complex data analyses.
Full citation: Adam Rule, Ian Drosos, Aurélien Tabard, and James D Hollan. 2018. Aiding Collaborative Reuse of Computational Notebooks with Annotated Cell Folding Proc. ACM Hum.-Comput. Interact. 1, CSCW, Article 150 (November 2018), 12 pages. DOI: https://doi.org/10.1145/3274419