QuantStack: 2022 in review
Let's look back at what we have accomplished in 2022!
2022 was an amazing year of innovation for the open-source developers at QuantStack. Developments range from major improvements to the Jupyter project to the packaging ecosystem and high-performance computing.
Here are some highlights of the 2022 achievements. Buckle up!
Project Jupyter
Collaborative Editing
In 2022, the QuantStack team focused on improving the collaborative editing features in JupyterLab. They were able to integrate significant enhancements to stability and usability into the JupyterLab code base.
Key improvements include:
- The creation of a stable document identifier allowing for collaborative sessions to continue working when files are moved or renamed.
- The integration of a CRDT peer in the backend, using a Rust implementation of the Yjs CRDT protocol to enable the persistence of collaborative sessions and simplify the loading and saving of documents from and to the file system.
- The removal of the legacy document models based on ModelDB, duplicating the state from the new collaborative models. ModelDB-based models were retained in JupyterLab 3.x for backward compatibility. Their removal significantly simplifies the JupyterLab codebase.
- The addition of a UI for tracking connected collaborators on documents.
These improvements result in a much more stable and pleasant collaborative editing experience compared to earlier versions, making it a first-class feature of JupyterLab 4.0.
The real-time collaboration effort at QuantStack was a group effort, by Carlos Herrero, Frédéric Collonval, David Brochart, Jeremy Tuloup, Trung Duc Le, and Martin Renou.
The collaborative editing effort at QuantStack was started thanks to grants from Schmidt Futures and the Alfred P. Sloan Foundation, which led to the release of the first collaborative feature in JupyterLab 3.1.
The next iteration was enabled thanks to funding from Two Sigma, which allowed the work on the backend state store, the removal of the Legacy ModelDB, and the UI for tracking connected collaborators.
Collaborative experience on richer content types
Collaborative editing is already a common feature in many digital platforms for creating textual content online, and we believe it will become essential for any online content creation interface, from photo editing to CAD. Users will expect a real-time collaborative experience for any type of content.
In 2022, we began developing a demonstrator for a collaborative CAD modeler called JupyterCAD, which is built on the same foundations as the collaborative editing features in JupyterLab for notebooks and text files.
Although still in its early stages, JupyterCAD illustrates the potential of such collaborative workflows. We can see how they can improve the efficiency of large organizations developing complex systems by enhancing collaboration among teams.
The JupyterCAD project was led by Trung Duc Le, working with Martin Renou, David Brochart, and Carlos Herrero.
JupyterLab performance
JupyterLab 4.0 will be significantly faster than previous versions. This was achieved both through systematic tracking of performance bugs and through significant upgrades to the Jupyter communication protocol and rendering mechanism for documents.
One key addition was the creation of a UI performance benchmarking tool, which can test any pull request against the main branch, checking for performance regression.
With the tooling in place, we were able to measure and fix a large number of performance pain points.
- Important performance improvements were obtained by migrating the JupyterLab UI from CodeMirror 5 to CodeMirror 6, making notebooks and text editors more than 3 times faster to render. (The above benchmark example is the result of the CodeMirror 6 migration)
- Another speedup factor of five (in the sample of test notebooks used in the benchmarks suite) was obtained by enabling the virtual rendering of notebooks, that is to only render what should be visible in the viewport.
Both these developments required deep refactoring of the JupyterLab codebase, over hundreds of source files.
Beyond large documents, significant performance improvements were made in the Lumino package, which underlies the JupyterLab frontend, with the recent release of Lumino 2.0.
The user experience of JupyterLab will be a lot snappier in JupyterLab 4.0 than in earlier versions.
Another significant performance improvement coming along JupyterLab 4.0, and unrelated to the rendering concerns the Jupyter server package. Indeed, the Jupyter server relays messages between the frontend and kernels. The server ⇄ kernel communication is done over ZeroMQ sockets, while the server ⇄ client communication is done over WebSockets.
Unfortunately, up until recently, the server ⇄ kernel (ZMQ), and the server ⇄ client (WebSocket) protocols differed slightly so that the server had to parse every message and re-serialise them in both directions. This processing cost was small for short messages such as most execution requests and replies. However, it can be very costly when dealing with larger messages such as the rich display of tables, or plots.
In Jupyter Server 2, the WebSocket connection supports a new “aligned” protocol in which messages can simply be copied over to and from ZeroMQ messages. Observed speedups in typicall messages are of one order of magnitude.
This major push on the JupyterLab performance was led by Frédéric Collonval, who worked on the Virtual Notebook rendering and the benchmarks. Johan Mabille completed the CodeMirror 6 migration, and David Brochart implemented the protocol alignment feature. Afshin Darian led the effort on the Lumino 2.0 release.
This effort on improving Jupyter performance was funded by Two-Sigma.
For more details on the performance improvements, please check our article on the Jupyter blog.
Language server protocol
A major development in the JupyterLab codebase that will be released in the 4.0 version is the integration of the Language Server Protocol in core JupyterLab. A lot of these features were available in the JupyterLab-LSP extension. We iterated upon this codebase and upstreamed the main components dealing with:
- the lifetime management of LSP servers,
- a customizable completer for code editors,
- the native support of the notebook documents by the JupyterLab LSP integration.
Integrating the Language Server Protocol features in JupyterLab is key to bringing a fully-fledged IDE experience to JupyterLab.
This development was led by Trung Duc Le. This effort on upstreaming the LSP features in JupyterLab was funded by Bloomberg.
JupyterLab inspector
The JupyterLab inspector is one of the ways to enhance your coding experience, it is a UI panel that provides contextual help while you are typing.
Thanks to the recent work on the jupyterlab-pygments extension and docrepr, the inspector can render rich HTML output with code highlighting support:
This work is the result of the collaboration between C.A.M. Gerlach (from the Spyder project), Ahmed Fasih (from Bloomberg), and Martin Renou (from QuantStack).
It was funded by Bloomberg.
You can learn more about this by reading our article on the Jupyter blog.
A new event system for JupyterLab
Alongside the Jupyter team at AWS and others who contributed valuable advice, we designed and implemented a new core Jupyter service, all the way from the server level up through the stack and even into Lumino: a generic events service. The back-end began life in an earlier implementation within the Jupyter telemetry package, but now resides as a server message bus that broadcasts via a WebSocket to all connected clients. To offer extension authors and core developers a flexible API, we created a version of Lumino signals that can be consumed as modern JavaScript async iterators.
The events system was a group effort that involved Afshin Darian at QuantStack and Piyush Jain at AWS, with invaluable help from Frédéric Collonval at QuantStack and Zach Sailer at Apple.
Xeus 3 and streamlining of JupyterLite kernels
With the apparition of JupyterLite and emscripten-forge, it was clear that xeus should be improved so that it could be used to build kernels for WebAssembly.
The initial support for WebAssembly in xeus was implemented in the xeus-python and xeus-lua kernels. However, these first examples showed that deeper changes in Xeus would be necessary to streamline the creation of Jupyter kernel that could run in the browser. This was the motivation for a major refactor that led to the release of Xeus 3.0, splitting Xeus into three parts:
- Xeus, the core native implementation of the Jupyter protocol
- Xeus-zmq, the middleware layer used by regular kernels communicating with the client through ZMQ
- Xeus-lite, the middleware layer used by WASM kernels communicating with clients over the JavaScript foreign function interface provided by Emscripten.
The addition of WebAssembly support in xeus kernel was developed by Thorsten Beier, Johan Mabille, and Martin Renou.
The work to support Xeus-python in JupyterLite was funded by Bloomberg.
New Xeus kernels
In 2022, a few new Xeus-based kernels joined the happy family!
A notable addition to the family is the xeus-octave kernel, a Jupyter kernel for the Octave programming language. Other new xeus kernels released in 2022 include xeus-wren, for the Wren programming language.
The xeus-octave project was created by Giulio Girardi, and recently moved to the jupyter-xeus organization. Antoine Prouvost made significant contributions on top of Giulo's work, setting up continuous integration, and consolidating the code base. Xeus-wren was developed by Thorsten Beier.
The future of the Jupyter notebook
JupyterLab was long described as the successor to the "classic" Jupyter notebook user interface. While a big part of the community moved to this newer interface, many users were attached to the full-page notebook UI provided by the classic notebook.
In 2021, Jeremy Tuloup started the RetroLab project, a Jupyter frontend built from JupyterLab components, but with a very similar look and feel to the classic Jupyter notebook. Retrolab benefits from all the upstream developments in core JupyterLab, such as collaborative editing, visual regression testing, accessibility, internationalization, debugger support, and the broad ecosystem of JupyterLab extensions (mime renderers, widgets, language server protocol).
With the approval of the recent Jupyter Enhancement Proposal on Notebook v7, the Jupyter project decided to base the next major version of the notebook package on this codebase.
The Retrolab project was started by Jeremy Tuloup, who also spearheaded the move of notebook towards this new codebase. Several other members of the team contributed to the Notebook v7 project, including Nicolas Brichet and Afshin Darian.
The plan is to publish Notebook v7 shortly after the JupyterLab 4.0 release.
Upgrading nbgrader
One key promise of the Jupyter Enhancement Proposal about the future of the Jupyter notebook was that the most popular Jupyter extensions that had not been ported to the JupyterLab extension system should be properly ported before the final release of Notebook v7 is published.
The two listed extensions are nbgrader (an automatic grading tool for Jupyter) and Rise (a tool to turn notebooks into interactive slideshows).
Thankfully, we received a grant to work on the port of nbgrader to JupyterLab, and were able to release nbgrader 0.8, turning nbgrader into a JupyterLab extension.
This development also motivated significant upstream work in the notebook metadata editor, which will be integrated in JupyterLab 4.0.
The migration of nbgrader to JupyterLab was done by Nicolas Brichet. Work is ongoing to support the Notebook v7 user interface.
The nbgrader port to JupyterLab was partly funded by Université Paris Cité.
For more details on the nbgrader update, check out our recent blog post on upgrading nbgrader.
JupyterLite
JupyterLite is a JupyterLab distribution that runs entirely in the browser built from the ground up using JupyterLab components and extensions. Most in-browser kernels rely on a WebAssembly build of the interpreter for the said language. Several flavors of JupyterLite are available (console, notebook, and fully-fledged JupyterLab interface) addressing a broad variety of use cases.
The main advantages of JupyterLite are the ease of deployment and scalability.
Because of these advantages, several major open-source projects are now including JupyterLite deployments on their websites, including NumPy, SymPy, and PyMC. Some of these websites are visited by millions of people monthly.
JupyterLite has the potential to enable the next 10M Jupyter users. Large scale deployments on very cheap hardware are now possible, enabling institutions to create Jupyter-based education programs for a large number of users without depending on cloud infrastructure.
A mamba-based distribution for WebAssembly
The first Python kernel for JupyterLite was based on the Pyodide distribution. While it includes many scientific computing packages, its monolithic distribution model does not allow to specify package versions, although versions of pure python packages installed on top can be set.
Being able to pin down package versions in an environment is a strong requirement for software reproducibility.
A locked-down WebAssembly environment could be seen as a reproducibility time capsule. WebAssembly being a recognized web standard, it ought to be runnable for much longer than native binary packages.
This is why we developed a mamba-based distribution of WebAssembly packages built with Emscripten. This was done by adding support for a new emscripten-32 platform for the Mamba/Conda package manager and build tools, and developing the conda-forge-inspired CI architecture for building the WebAssembly packages.
The development of the new emscripten-forge distribution was spearheaded by Thorsten Beier.
This development was funded by Bloomberg.
For more details, you can check out our recent announcement blog post about the new developments.
Jupyter community workshop on JupyterLite
Jupyter Community Workshops are a series of community-organized events to tackle challenging development and design projects, growing the community of contributors, and strengthening collaborations. They are funded by the Jupyter project thanks to generous donations of sponsors.
As part of the last round, we organized Jupyter Community Workshops on the JupyterLite project in Paris, which was gracefully hosted by OVHcloud. We were able to gather key contributors and community members of the Jupyter, Pyodide, and Emscripten ecosystems to iterate on this stack.
Accessibility of the Jupyter notebook
Developments by the QuantStack team enabled major strides towards making Jupyter accessible.
The main upgrade has been the migration from CodeMirror 5 to CodeMirror 6, which is a complete rewrite of the code editor with a strong focus on accessibility and performance.
The Jupyter frontend that is the most impacted by this upgrade is the new notebook v7. The Axe accessibility auditing tools reported over a thousand issues before the migration, and less than 50 after the migration. We are now working towards bringing that number to zero, by addressing reported problems one by one.
Even when the Axe-reported error count is brought to zero, many more improvements will be required for it to be truly usable by everyone, but these encouraging results show that the Notebook v7 is the most likely Jupyter frontend to reach that goal.
Nbconvert
Nbconvert is a package of the Jupyter ecosystem used to convert notebooks to a variety of formats. It underlies the Voilà dashboarding tool.
The WebPDF exporter, which was included in nbconvert 6 relies on a headless browser (Chromium) to generate PDF files from notebooks, by first rendering it to HTML. This enables the PDF rendering of the rich output types of notebooks.
In 2022, we developed the new QtPDF exporter, which achieves the same result using QtWebview. The advantage of QtPDF is that the requirements of the QtWebview library are properly packaged for most package managers (unlike chromium) and are a much smaller download. The QtPDF exporter was included in nbconvert 7.0 which was released this year.
Several security fixes were also included in the recent releases of nbconvert, following a security review by the GitHub security lab.
QuantStack team members who iterated on nbconvert in 2022 include David Brochart, Martin Renou, and Sylvain Corlay. Many community members made significant contributions to these releases.
The work of QuantStack on nbconvert was funded by Bloomberg.
Voilà dashboards
Voilà can turn any Jupyter notebook into a standalone web application. This means that the millions of notebooks shared on GitHub and other online venues are just as many potential interactive dashboards.
A major improvement to Voilà made in 2022 is the support for the new ipywidgets 8, now available in Voilà 0.4.0, which was released in October. Voilà 0.4.0 also includes the latest upgrades to nbconvert.
This upcoming Voilà 0.5.0 will allow Voilà to reuse JupyterLab components like mimetype renderers and dynamic theming. The final release is expected in 2023.
The migration of Voilà to ipywidgets 8 was started by a community member, Mario Buikhuizen, and completed by Jeremy Tuloup. The upgrade to use the JupyterLab extension system that will be released in Voilà 0.5 was started by Jeremy Tuloup, and continued by Martin Renou and Trung Duc Le.
The furthering of Voilà development in 2022 was funded by Bloomberg.
Jupyverse
Jupyverse is a reboot of the Jupyter-server project. Built upon FastAPI instead of Tornado, Jupyverse is meant to enable a modular server architecture to enable more flexible deployment scenarios for Jupyter with enterprise deployments in mind.
The main benefits of Jupyverse over jupyter-server include
- A clear distinction between user settings and server configuration and the ability to host user settings in a database hosted on a different host than the server.
- Improved performance thanks to the adoption of FastAPI.
- The plugin-based architecture, allowing to create “remixes” of the base and third-party replacement plugins for custom deployment scenarios and allowing for “microservice”-type deployments.
We are convinced that large-scale enterprise deployments of Jupyter will eventually be done using Jupyverse, especially with the more complex permissioning scenarios arising from collaborative editing (which is a paradigm shift from the single-user server model) and the new sharing models that it enables.
The jupyverse project is led by David Brochart.
Ipywidgets 8.0
Ipywidgets 8 has been in the making for a long time (the effort towards 8.0 started at the Jupyter Community Workshop on Jupyter widgets in January 2020). This is a major upgrade of the 7.x series. This new release was focused on consolidating the ipywidgets package. It includes numerous bug fixes and improvements. It also brings new widgets to the mix, like the DatetimePicker, TimePicker and the TagsInput and ColorsInput.
The ipywidgets work is a longstanding team effort including major contributions from the community. QuantStack also put a lot of effort into making this release possible. Many contributions came from QuantStack team members Jeremy Tuloup, Trung Duc Le, Martin Renou, Wolf Vollprecht, and David Brochart.
User testing and UX of Jupyter
We recently started a new endeavour to improve the user experience of Jupyter, and are working towards a more systematic approach to the design of the project. We have been conducting user research for JupyterLab using remote usability testing with cohorts of community members and new users of the project. This has enabled us to discover UX issues in the existing features, and to test design prototypes for new features in the pipeline.
The user design study was devised and conducted by Gabriela Vives, who joined the QuantStack team as a UX specialist in 2022.
Visual programming in JupyterLab
Block-based programming has become ubiquitous in school curricula for early computer science education. To provide a smooth ramp of complexity for learners, we designed a JupyterLab extension for Blockly so that Jupyter can be used from the very first steps of their learning journey. We also used Blockly’s code generation feature to create a more integrated experience with Jupyter.
JupyterLab-blockly is extensible with custom programming blocks. This feature was used to develop two add-ons:
- jupyterlab-niryo-one: a collection of blocks to control Niryo’s “One” and “Ned” six-axis arms within JupyterLab-blockly. These robot arms by Niryo are great options for robotics education, and for learning ROS.
- jupyterlab-lego-boost: a set of blocks to control the LEGO® Boost robot. LEGO® Boost is a very cheap toy often recommended for robotics education.
JupyterLab-blockly works in JupyterLite, but the Niryo and LEGO extensions are not available in this environment yet. We are currently working on enabling the LEGO extension in JupyterLite by using the experimental Web Bluetooth API. This will enable owners of this toy to play with their robots in JupyterLite with Blockly without installing anything in their machine.
JupyterLab-blockly was developed by Denisa Checiu and Carlos Herrero.
The work on JupyterLab-blockly was self-funded by QuantStack. We are seeking funding sources to further this work.
For more details about JupyterLab-blockly, you can check out our blog post on the package.
Jupyter and robots
On the robotics front, the QuantStack team has continued its endeavor to develop a more comprehensive suite of extensions for working with the ROS ecosystem in JupyterLab. The latest addition to the family is the JupyterLab-URDF package, which is a viewer and live editor of URDF (Universal Robot Description Format) for JupyterLab.
With this extension, users can modify the XML file containing the description of the robot and those changes are immediately reflected in the viewer. Mesh files can also be displayed when they are included in a standard robot description package. Additionally, the viewer provides the user with the ability to move the joints of the robot to their specified limits in the URDF.
JupyterLab-URDF is directly usable in JupyterLite. You can try it out on your browser.
The JupyterLab-URDF package was written by Isabel Paredes.
Furthermore, the Jupyter-ROS package, originally developed by Carlos Herrero and Wolf Vollprecht, has received a major update. New widgets have been added to the extension, including:
- the illustrious ROS turtle simulation,
- clients and services,
- action clients and action services.
Currently, Jupyter-ROS has limited support for ROS 2 distributions thanks to a collaboration with Luigi Dania. Further support for ROS 2 is currently in progress.
Moreover, the robotics team helped RWTH Aachen University to implement the first JupyterHub instance integrating ROS and multiple Jupyter robotics extensions in order to teach ROS at the graduate level in a robotics program.
Towards a JupyterLab visual theme editor
JupyterLab enables customizing the visual appearance of the interface through themes, which can be installed like any JupyterLab pluggins. However, creating a custom theme requires development skills.
In the past few months, we have been working on a parametric theme editor allowing end users to customize the colors and layout of their environment.
Allowing end users to fine-tune their color scheme, font sizes, or any other customizable parameter of a theme, may be an important usability feature for users having specific contrast requirements. We are working on generating appropriate palettes from a single color picked by the user. Stay tuned to future announcements on the theme editor!
The work on the visual JupyterLab theme editor is led by Florence Haudin and Frédéric Collonval
Package Management
A core component of our package management strategy has been the Mamba project. Mamba is a reboot of the conda package manager written in C++ . It has been adopted at scale by conda-forge, project Jupyter, Binder, and many other open-source projects
The main benefits are:
- Speed. Mamba is significantly faster than conda for resolving satisfiability constraints for creating environments, which has been a longstanding pain point when dealing with large channels.
- Small runtime memory footprint. Mamba’s memory footprint improves upon conda, which can prevent running out of memory when solving an environment.
- Smaller download. The micromamba executable (a statically-linked executable that can create environments and install packages) weighs 5Mb compressed, which is a lot smaller than the ~50Mb minimal miniconda compressed tarball.
- Nice command line interface. Micromamba provides a user-friendly interface, with rich colorful console outputs.
Micromamba also includes an implementation of the TUF (The Update Framework) protocol for software supply chain security.
Mamba 1.0
Mamba was a major focus of the team in 2022. A significant effort went into consolidating the codebase in terms of concurrency and IO, and we are very confident in the stability of mamba. Mamba 1.0 was released earlier this year, and the transition to this new version was completely painless for the community.
The push towards Mamba 1.0 was led by Wolf Vollprecht, with major contributions by Joel Lamotte and Johan Mabille.
The work on the consolidation of mamba 1.0 was funded by Bloomberg, and QuantCo.
Better error messages for Mamba.
A major pain point in the Conda ecosystem has been the troubleshooting of cases where the desired packages cannot be installed because no solution to the version constraints exists.
In 2022, we significantly improved error messages in case of unsolvable environments. This was achieved by inspecting the "proof of unsolvability" provided by the underlying SAT solver called libsolv.
The resulting error messages provide a more intelligible description of the conflicts, allowing end users to troubleshoot version conflicts more easily.
The research and development of the new error messages were done by Antoine Prouvost in collaboration with Claudia Rogoz from Palantir.
This work was made possible thanks to a grant from the Chan Zuckerberg Initiative awarded to the Conda-Forge project as part of the essential open-source software for science grant cycle 4 (EOSS-4).
For more details on the new Mamba error messages, check out the recent blog post on the subject.
Hosting conda packages on OCI registries
Another major ongoing effort on the mamba project is the ability to install conda packages hosted on OCI registries. This will be key to making major conda-based distributions truly scalable.
Instead of simply adding OCI support in mamba, the "downloading" part of the package was split into a utility package called "Powerloader" which handles every download-related aspect of mamba (parallel download, mirror selection).
Powerloader and its integration in the mamba codebase are code-complete and will be released in the next minor version of the package.
XSimd
Xsimd provides a unified C++ API for performing SIMD operations, and supporting a large number of SIMD instruction sets:
- For x86 architectures: SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, FMA3, AVX2, AVX512, SSE4A, FMA4, XOP.
- For ARM architectures: ARMv7, ARMv8, SVE.
Since it was split from Xtensor as an independent project, Xsimd has grown considerably in popularity, and major open-source projects have adopted the package such as Apache Arrow, Krita and Pythran.
In 2022, we added support for the SVE instruction set. In parallel, we improved the API, and implemented a lot of new features: gather and scatter, swizzle, zip, reducers…
The development of xsimd is led by Johan Mabille and Serge Guelton. AmySpark and Yibo Cai from the Krita and Arrow projects respectively made significant contributions to xsimd.
QuantStack is seeking funding and help to enable new architectures on XSimd, such as POWER architectures.
JupyterCon 2023
After two years without an installment of JupyterCon, the global conference about Project will be back in 2023. The conference will be held in Paris from May 10 to 12 at Cité des Sciences, the largest European science museum. Sylvain Corlay is taking on the role of general chair of the conference.
Changes in leadership at QuantStack
The QuantStack team has grown significantly in the past two years, and from a small group of open-source developers to a significantly larger team.
Wolf Vollprecht left his position as the CTO of QuantStack to start a new venture on package management technologies called Prefix.dev. QuantStack is reverting to a collegial leadership by the technical directors.