Announcing the next LinkedEarth chapter: PaleoCube

Deborah Khider
CyberPaleo
Published in
4 min readSep 21, 2021

TL;DR: With another round of funding from EarthCube, LinkedEarth will now bring paleoscientists (and their data) to the cloud to run their analyses, create a gallery of curated Jupyter Notebook for reproducibility and reuse, and provide several training opportunities. The idea is to make all paleoclimate data (model outputs and proxy) available in only a few clicks (well, keystrokes).

LinkedEarth and data standards

As Julien Emile-Geay explained in the inaugural blogpost on Medium, LinkedEarth was originally created to help paleoscientists curate their data. To do so, the LinkedEarth team, with support from the community, promoted two standards: (1) the Linked Paleo Data format (LiPD), and (2) a reporting standards (PaCTS).

Armed with these two standards, it is now possible to write code that automates most of the data cleaning, run analyses with a few command lines and allow to generate publication-ready plots. We have two packages that already enable users to do just that: (1) GeoChronR in R, and (2) Pyleoclim in Python.

The logical next step is to bring these standards and analysis code to the cloud, following the Pangeo example, and link paleoclimate observations with model outputs. This is what PaleoCube is set to do with fuding from the National Science Foundation EarthCube program.

Concretely, this means that anyone will be able to access data and code from a JupyterHub without downloading/installing data/software on their computers. The space will be yours and you will be able to add your own data as well, without sharing them with the entire community.

What is PaleoCube?

PaleoCube is a new initiative from LinkedEarth that links together data, analysis, and education. It significantly expands on previous activities as shown below:

Diagram illustrating all LinkedEarth activities.

PaleoCube is headed by your friendly LinkedEarth leadership team: Julien Emile-Geay, Nick McKay, and myself.

What should I be looking forward to in the next three years?

Analysis:

  • Improved capabilities for the LiPD utilities and Pyleoclim. On the first, we are planning to expand them to be able to directly ingest data from NOAA and Pangaea (with some minor human help).
  • Computation in the cloud: following the Pangeo model, all software and data will be cloud-accessible. No need to download/install anything locally. With a JupyterHub, you’ll get a private space in the cloud where you can play around with all the available tools. The Hub will be managed by 2i2c to ensure compatibility with Pangeo.
  • Better integration with the existing scientific Python ecosystem (namely Pandas and Xarray). It may not sound exciting at first but is necessary to ensure a seamless experience when working with climate output and proxy data.

Science:

  • Working in collaboration with Pangeo, many PMIP simulations will be made available to the community in a cloud-ready format (Zarr).
  • Computational narratives in the form of Jupyter Notebooks will be made available through a gallery for reproducibility and reuse. All notebooks will be executable through the Hub and mybinder. In a separate blog, Julien will detail plans for these narratives.

Training: In addition to the gallery above, we will offer several training opportunities for the community to get familiar with our tools.

  • Virtual hackathons: To be held 4 times a year, these hackathons will cover beginner functionalities (e.g. basic Pyleoclim) to more advanced topics (Dashboard creation, developing science notebooks). We have partnered with Project Pythia for an introduction to Python relevant to geoscience problems.
  • Tutorials in the form of YouTube videos and blogs on this Medium account
  • Office hours: we will offer virtual office hours for questions related to the use of LinkedEarth tools.

Community: Above all else, LinkedEarth is a community of paleogeoscientists. We will catalyze discussion around topics that interest us the most on several platforms:

  • Discourse: In the fall, we will launch a LinkedEarth Discourse forum to talk about everything paleo: data, code, opportunities (job/grants), education…
  • Slack: In the spring of 2020, we launched a LinkedEarth Slack workspace for discussion. If you’d like to join, email us.
  • Open community meetings: In the spring of 2021, all of our meetings (community, code development) will become open to the community. We want your feedback on future directions!

All you need to join in the fun is a GitHub account, which you will be able to use to comment directly on our GitHub account (and associated repositories), access the JupyterHub, and Discourse, and of course Zoom for virtual meetings.

What should I be looking forward to in the next few months?

  • Our second PaleoHack will be held virtually Oct 28–29th 2021. Head here to register. Note that the space is limited to 50 participants but you will get another chance in the spring!
  • Discourse forum: We will setup the Discourse forum in the next month or so! The idea is to create a space where we can discuss everything paleo from opportunities (jobs/grants/talks) to data and code.
  • AGU Town Hall: LinkedEarth will be at AGU 2021 in the form of a town hall. We will discuss future directions for LinkedEarth, give demonstrations about GeochronR, Pyleoclim, and give you a chance to tell us what you want. Schedule to be announced in October.

I hope you’re as excited as we are about the (cyber)future of paleoclimate and to see you at one of our virtual meetings or in person at AGU.

Deborah,

On behalf of the LinkedEarth team.

--

--

Deborah Khider
CyberPaleo

Research Scientist at the USC Information Sciences Institute - Data Science, AI, and paleoclimatology