Data Science in the Jupyter Era: Insights from JupyterCon 2023

Oblivious Devs
6 min readMay 18, 2023

The just concluded JupyterCon 2023 conference, held at Europe’s largest science museum, the Cité des Sciences et de l’Industrie, showcased infinite possibilities within the data science realm, empowered by the Jupyter platform.

The conference, from May 10 to 12, provided valuable practical insight into how diverse teams, like data scientists, business analysts, educators, and researchers, can leverage the power of the Notebook and other Jupyter tools to collaborate effectively, streamline and extend the capabilities of their workflows, scale their projects, and ensure the reproducibility of their data science endeavours.

We got to attend keynotes, tutorials, code sprints, and talks (where we marked the launch of our community-driven, open-source platform Antigranular) across three days.

Keynotes

A summary of some keynotes and insights is below;

  • Alyssa Goodman, professor of Astronomy at Harvard University, shared her insights on how Jupyter, together with the glue project, opens up new opportunities for exploratory data exploration and visualisation in astronomy and beyond.
  • Paul Romer, Nobel laureate economist and former Chief Economist of the World Bank, emphasised the transformative power of Jupyter. He discussed how Jupyter Notebooks revolutionise the research process and the communication of research findings. Romer highlighted the unique advantage of Jupyter’s interactive nature, enabling researchers to seamlessly integrate code, data, visualisations, and explanatory text into a single dynamic document.
  • Finally, Craig Peters and Cory Gwin from GitHub presented the development and capabilities of GitHub Codespaces. They specifically addressed how Jupyter integration enhances this robust tool for collaborative code development and collaboration. GitHub Codespaces provides a powerful platform for developers to work together effectively, and the integration of Jupyter Notebooks adds another layer of functionality and flexibility to the tool.

Read our summary of highlights from the various JupyterCon tracks we found interesting this year;

Community: Tools and Practices — The Multi-Stakeholder Approach 🌐

A key takeaway from this year’s conference was the focus on how the open-source communities’ landscape has undergone radical changes over the past two decades. Fernando Perez, founder of IPython and a co-founder of Project Jupyter, underscored the importance of fostering a multi-stakeholder community within the Jupyter context. He describes in his talk how, right from its inception, IPython and Jupyter were collaborative endeavours that thrived on numerous individuals’ contributions, ideas, code, feedback, and dedication.

While he acknowledges that a sense of vision and direction is essential for the long-term success of any project, Perez argues that a ‘dictator’ is entirely the wrong metaphor to base that work on. He calls attention to how Jupyter has transitioned away from that model, with the aspiration for more projects to discover better ways to harness the collective energy of the community.

The multi-stakeholder approach embraces community tooling, including frontends, kernels, extensions, and other tools in the Jupyter ecosystem. Several talks saw expert speakers navigating the Jupyter landscape in diverse ways.

Johan Mabille, Technical Director of QuantStack and co-author of xeus gave a talk discussing the recent advancements in the xeus stack and its flexible architecture that enables the development of kernels to run entirely in the browser. He explains how xeus simplifies the creation of new kernels, particularly for languages a C or C++ API, allowing kernel authors to focus on language-specific aspects without the need to handle the protocol intricacies.

This year’s conference also saw a growing recognition of the interactivity in Jupyter Notebooks, with numerous open-source projects and widgets, like Thebe (Jupyter-based interactive computing that turns static HTML pages into interactive ones), ipydagred3 (Jupyter widget that visualises live data pipelines in JupyterLab), and IPytone (Jupyter widget library that explores data through sound and fills the gap by providing many audio components). These extensions can integrate to provide a performant, intuitive interface allowing users to explore, visualise and manipulate data dynamically.

Enterprise Jupyter Infrastructure — Conscious Data Access 📈

The conference covered various aspects of deploying Jupyter and JupyterHub at scale in industry, government, high-performance computing, science, education, and other settings. As a result, Enterprises that integrate Jupyter into their data science workflow need to address challenges related to security, scalability, collaboration, and optimising infrastructure for high-performance use cases.

A few examples of enterprise Jupyter Infrastructure this Summit:

  1. Some organisations prefer to set up their Jupyter infrastructure on their own servers or data centers. Sarah Gibson, an open-source Infrastructure Engineer at 2i2c, shared insights about transforming 2i2c’s tooling for managing JupyterHubs and Kubernetes clusters. The goal is to enable multiple JupyterHub deployments and Kubernetes clusters through a single infrastructure repository. Preserving each JupyterHub community’s autonomy is a crucial focus, allowing them to extract their configuration from 2i2c’s system and independently deploy it elsewhere. This concept, the community’s Right to Replicate its infrastructure, holds significant importance.
  2. Containerised environments: Docker has gained popularity for managing Jupyter environments. Enterprises can create Docker containers with Jupyter and the necessary libraries, dependencies, and extensions. Tétras lab, for example, is an open-source platform propulsing Notebooks as Web applications composed of a Docker stack with multiple containers. These containers can be deployed and scaled using Kubernetes, making managing and distributing Jupyter environments easier across a cluster of machines.
  3. Some enterprises use data science platforms, such as Databricks, which provide a collaborative environment for data scientists and analysts. Jason Grout, staff software engineer at Databricks, talked on the platform recently adopting Jupyter standards and software to power several features. He discussed how they encode Databricks-specific visualisations in exported Jupyter Notebook files in a way compatible with other Jupyter tooling. Also, he pointed out that in Databricks, the document state lives on the server, which changes how Jupyter kernel messages are processed.

Jupyter in Education — Computational thinking in practice 📖

Dr. Dhavide Aruliah, the Director of Education at Quansight LLC, dedicated his talk to 10 Years of Teaching with Jupyter: Reflections from Industry and Academia. He spoke about his experience wrestling numerous nuts-and-bolts technological obstacles to promoting computational thinking in learners with Jupyter.

For example:

  • Deploying robust, “99% invisible” computational environments so learners can start quickly (i.e., without painful software installation or configuration);
  • Accessing data smoothly (especially when balancing privacy/security concerns with teaching goals); and
  • Managing Notebook versions for collaborating instructors.

He discusses several technological and non-technological approaches to tackle such challenges. These include Nebari, an open-source platform for distributing JupyterHub; RISE, an extension that enables creating interactive slideshows in Jupyter, Anaconda Project to support reproducibility and sharing of data science projects; and EtherPad for real-time collaborative editing. Jupyter Notebooks intentionally design to provide immediate learner feedback, allowing learners to modify code interactively and enhance their understanding autonomously — which is Aruliah’s ultimate goal when teaching.

The talks for Jupyter in Education paves the way to leverage Education experiences and pedagogical support in reusable Jupyter Notebooks and Books.

Our work at Antigranular 🚀

Our team at Oblivious had the opportunity to present Antigranular, our eyes-off data science platform, to the Jupyter community. By leveraging privacy-enhancing technologies, we strive to cultivate trust and establish a data-availability advantage.

We focus on privacy-centric use cases, enabling data scientists and builders to connect and extract insights from sensitive data while ensuring privacy through secure enclaves and differential privacy techniques. When users connect to Antigranular, we provide a dedicated space and memory for program execution, securely manage their data on servers, and track code execution and session information using kernels for seamless collaboration and transparency. This arrangement ensures that the sensitive data remains protected while being accessible for analysis.

How to get involved? 🌎

Join our global community of privacy-minded data scientists and engineers from over 33 countries as we work together to shape the future of privacy in machine learning.

Our platform is the ideal space to share your projects and ideas within the community; it opens the door to valuable feedback and support from fellow members who share your passion for privacy-focused data science! Start by joining our Discord.

And don’t miss our upcoming Eyes-Off Data Summit in Dublin this July! Whether you prefer to attend in person or virtually, you can register now to secure your spot. Join us for engaging panel discussions, and fireside talks featuring our expert speaker lineup, and get hands-on experience with privacy-enhancing technologies (PETs) through our tutorials, workshops, and live Hackathon.

--

--