Jupyter meets the Earth: EarthCube Community Meeting

Published in

pangeo

6 min readAug 19, 2020

By: Lindsey Heagy, Fernando Pérez, Joe Hamman and the Jupyter meets the Earth team (cross-posted on the Jupyter blog)

As a part of the 2020 EarthCube annual meeting, we held a Jupyter meets the Earth community discussion session on July 27. The Jupyter meets the Earth project is an EarthCube funded effort that combines research use cases in geosciences with technical developments within the Jupyter and Pangeo ecosystems. In this model of equal partners, scientific questions help drive software infrastructure development, and new technologies expand the horizons of viable research. This online workshop was an opportunity to gather members of the community, welcome newcomers, provide updates on the Jupyter and Pangeo ecosystems, and have time for discussion.

The goals for the meeting were to:

Provide an overview of the Jupyter & Pangeo ecosystems for researchers from the EarthCube community.
Outline avenues for getting involved.
Gather input for what advancements would best serve your research needs.

Over 100 participants registered, and we had contributions from 9 speakers. The meeting was a mix of presentations and Q&A from the community. The full recording of the meeting is available on youtube, and we encourage continued discussion on the associated discourse post.

Presentations (google drive folder)

Fernando Pérez (slides) started off the meeting by introducing the Jupyter meets the Earth project — an effort aimed at driving forward technological developments in the Jupyter and Pangeo ecosystems in partnership with researchers in the geosciences. The motivation is to advance research and the software that supports it by combining domain expertise with methods in data science, software & data engineering practices. He provided an overview of Project Jupyter, highlighting the interplay between software and content, services, standards, community and governance that is necessary for broad-impact scientific open source software projects. He presented the extensible JupyterLab platform, that can be adapted to domain-specific needs as illustrated by the FlyBrainLab and Cloud Robotics Command Station efforts. The Jupyter meets the Earth team aims to similarly develop tools and extensions that will support interactive computing workflows in the geosciences.

Overview of Jupyter and Jupyter meets the Earth from Fernando Pérez

Next up, Scott Henderson (slides) provided an overview of Pangeo and associated community events, including Hackweeks. “Pangeo is first and foremost a community promoting open, reproducible, and scalable science.” In terms of technology, this involves developing fully open source tools that can be deployed on shared computational infrastructure, such as HPC centers or the cloud, and hosting several forums to foster communication between scientists and software developers. Software is an important avenue for connection, but the overarching goals are a rallying point for a community. The critical mass of enthusiastic people has been key to the success of the Pangeo model.

Overview of Pangeo and Hackweeks from Scott Henderson

The Pangeo model is intended for use both on the cloud and on High Performance Computing (HPC) infrastructure. Kevin Paul gave the third talk on Pangeo on HPC (slides, notebook). HPC and cloud computing environments present technical differences in terms of usage patterns, file access, and resource allocations, however, the goals of Jupyter and Pangeo are similar in both cases — to enable interactive computing and simplify the user experience on both. Tools such as dask and dask-kubernetes/dask-jobqueue are targeted at enabling parallel computing on both infrastructures.

Kevin Paul giving us a demo of Pangeo on the Cheyenne supercomputer

After a Q&A session that included questions on the computational cost of running Pangeo Infrastructure and efficient use of tools including Zarr, we moved on to a series of lightning talks.

Lightning Talks

Six speakers presented short lightning talks on aspects of the Jupyter and Pangeo ecosystems ranging from technologies to scientific applications to opportunities to engage with the Pangeo community.

Anderson Banihirwe (notebook and details in intake-esm) kicked off the lightning talks by giving a demo and overview of Intake — a project to streamline loading and sharing of data. He showed a demo that included both Optimum Interpolation Sea Surface Temperature (OISST) data, as well as data from the Coupled Model Intercomparison Project (CMIP) running interactively on Cheyenne, the supercomputer at NCAR and using Dask for distributing the workload across nodes, with real-time diagnostics of the distributed computation provided by Dask’s JupyterLab extension.

Demo from Anderson Banihirwe using intake to access OSSIT and CMIP data

Next up, Scott Dale Peckham (notebook) gave a presentation that demonstrated the use of ipywidgets and ipyleaflet to create an interactive interface for fast access to geoscience data on servers that support the OpenDAP protocol. The project he champions is called BALTO, the Brokered Alignment of Long Tail Observations (also a famous Siberian Husky and sled dog). These graphical interface elements can be used in a programmatic workflow such as a Jupyter Notebook, but they conveniently encapsulate many details of accessing the data and resources provided by BALTO. This allows the scientists to focus on their research questions, without having to break their workflow to access data with external tools.

We then had a talk from Edom Moges (slides) who presented work he is conducting with Laurel Larsen’s research group in hydrology as one use case in the Jupyter meets the Earth project. The presentation focused on a data synthesis work that aims to build a Jupyter based interactive platform that transforms raw hydrometeorological data to a gap-filled ready to use data for several intensively monitored watersheds across the US. The platform will be a basis for future community initiatives to benchmark data processing approaches, support comparative hydrological studies and comprehensive data-driven forecasts.

Lightning talk from Edom Moges and Laurel Larsen on the hydrology use-case in the Jupyter meets the Earth project

Georgiana Dolocan (video) impressed us next with an animated video accompanied with her narration to explain JupyterHub, its components (authenticator, spawner, proxy) as well as deployment options. The littlest JupyterHub (TLJH) is designed to make it simple to deploy multi-user Jupyter infrastructure on a single machine, and the more sophisticated Zero 2 JupyterHub Kubernetes (Z2JH) option is meant to scale to many users and large computational needs.

Animations from Georgiana Dolocan on JupyterHub

Presenting from the perspective of an enthusiastic user, Erik Sundell (slides) gave us an overview of Jupyter Book: a tool to quickly create beautiful websites from notebooks and markdown. He walked through how to host them for free online in a time efficient way, and highlighted features including connections to Binder, which enable users to run content interactively.

Overview of JupyterBook from Erik Sundell

Joe Hamman (slides) finished off our lightning talk session by outlining avenues for connecting with the Pangeo community. These include day-to-day communication on GitHub, Gitter, discourse and twitter, as well as more recent coffee-breaks. Depending on your topic of interest, there are also working groups that you can join on topics including data, machine learning, education, cloud computing, or you can suggest your own!

Connecting with the Pangeo community — an overview from Joe Hamman

Follow up and further discussion

To continue the discussion afterwards, we posed (slides) a few questions where we hope to learn from the community’s needs, such as:

What does your interactive computing workflow look like today? What do you envision it will be in 5 years?
How would you like to publish and share your computational research and where can improvements be made?
How do you stay up to date with the evolving open-source ecosystem? How would you like to be keeping up-to-date?

We are looking for your input and ideas! Please add your thoughts to the discourse post.

Thanks

Thank you to the participants, speakers, and especially Lynne Schreiber and Ouida Meier from the EarthCube office for all of their support and work (even with very last-minute requests!).

This work is part of the Jupyter meets the Earth project, supported by the NSF EarthCube program under awards 1928406, 1928374.

Jupyter meets the Earth: EarthCube Community Meeting

Presentations (google drive folder)

Lightning Talks

Follow up and further discussion

Thanks

Written by Lindsey Heagy