Image for post
Image for post
Arctic sea ice reached its annual minimum extent Sept. 19, and then again on Sept. 23, 2018. Credits: NASA’s Goddard Space Flight Center/Kathryn Mersmann

Polar deployment of Pangeo

Nic Wayand
Nov 15, 2018 · 5 min read

tl;dr: is a new deployment of Pangeo - targeting the Polar research community.

Pangeo is a community platform for Big Data geoscience that has recently been federating into domain-specific deployments (Atmospheric Science, Oceanography, Hydrology, Astronomy, Neuroscience, and Polar Sciences). This community driven decision came about by a need to further customize cloud environments for individual communities of practice. Customizations can be as simple as including new Python packages (e.g. Cartopy, Astropy) or configuring a cluster’s computational resources (e.g. memory available). Finally, domain-specific deployments should help lower the barrier for new communities of scientists to incorporate Pangeo into their workflow by providing more relevant data sets and example scripts.

Why the Polar Science Community?

It is a time of rapid and uncertain change in the Arctic and Antarctic. As new remote and in-situ observations fill up disk space, Pangeo is uniquely situated to provide the tools required to answer Polar science questions that are more frequently requiring the use of multiple observations and multiple models. One pressing consistent challenge when migrating to a Cloud-based workflow is the lack of Polar relevant datasets in cloud optimized formats, which has likely prevented scientists from adopting Pangeo into their workflow.

In this blog post I will give a brief overview of:

  • new Sea ice datasets available in the Cloud
  • new interactive Jupyter notebooks
  • how we set up the Polar deployment

What new data sets are there?

The below datasets focus on Sea Ice in the Arctic. The first two were created as part of the Sea Ice Prediction Network Phase II (SIPN2), which is a community effort to improve sea ice prediction in the Arctic. They have been converted from their native format (NetCDF or GRIB) to Zarr, a cloud-optimized format. With time we plan to add in-situ observations and additional remote sensing data.

New Data sets:

  • Arctic sea ice concentration (SIC) forecasts from 20 models (Jan. 2018 to present, up to 1 year lead time, 2.6 GB)
  • Arctic sea ice thickness forecasts from 4 models and IceBridge observations (Jan. 2018 to present, up to 1 year lead time, 188 MB)
  • Observations of sea ice concentration (Nasa Team, Bootstrap, Near-Real Time, 5 GB)

New Example Notebook:

A new example notebook (Example_plot_SIPN2_data.ipynb) is now available that plots forecasts and observations of sea ice concentration, probability, and anomalies. With intake, loading in the SIC forecast dataset is as simple as:

Below is an example plot at a 1-week lead time valid for the second week of August 2018. Large differences can be seen between the forecasts, due to model resolution, number of ensemble members, and methods of sea ice initialization (among other reasons). What makes this dataset unique is that is updated in daily — providing updated large ensemble of forecasts for users in the Arctic, as well as rapid feedback to model developers.

The same dataset is also used to generate the below figures at the SIPN2 website.

How this was done*:

  1. Clone
  2. Start a new Kubernetes cluster on Google Cloud Platform
  3. Set up a CircleCI job to automatically update the new cluster image whenever updates are made (i.e. below bullets)
  4. Edit the environmental.yaml file to include python packages specific for polar research
  5. Finally, add example notebooks by submitting Pull Requests here.
  6. Go to and sign in with Github

*Plus lots of help from members of Pangeo to help get this running

How to get involved!

  • We are very interested in building this community and would encourage scientists to get in touch with us if they fall into the polar community
  • Request or add new datasets by submitting an Issue or emailing me at (Instructions for uploading new data here!)
  • Provide a new notebook example for you sub-Polar field

Conclusion: has the potential to rapidly advance Polar Sciences by bringing the data, compute power, and (most importantly) the researchers together in one shared platform. At the moment, I am the only one using for research (AGU 2018 Poster). The transition from working on a local machine to the cloud did not happen overnight and I am still 50:50. This transition was relatively painless because 1) my code lives on Github and 2) Xarray makes converting my existing Netcdf datasets to Zarr easy. A remaining challenge is to add non-gridded Polar datasets (e.g. Buoys, upward looking sonar, airborne lidar, IceSat-2 retrievals) in cloud-optimized formats that are useful for all Polar researchers.


A big thank you to Joe Hamman, Derek Ludwig, Chris Marsh, and Ryan Abernathey for help getting this set up and reviewing this post!


A community platform for big data geoscience

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store