Arctic sea ice reached its annual minimum extent Sept. 19, and then again on Sept. 23, 2018. Credits: NASA’s Goddard Space Flight Center/Kathryn Mersmann

Polar deployment of Pangeo

tl;dr: is a new deployment of Pangeo - targeting the Polar research community.

Pangeo is a community platform for Big Data geoscience that has recently been federating into domain-specific deployments (Atmospheric Science, Oceanography, Hydrology, Astronomy, Neuroscience, and Polar Sciences). This community driven decision came about by a need to further customize cloud environments for individual communities of practice. Customizations can be as simple as including new Python packages (e.g. Cartopy, Astropy) or configuring a cluster’s computational resources (e.g. memory available). Finally, domain-specific deployments should help lower the barrier for new communities of scientists to incorporate Pangeo into their workflow by providing more relevant data sets and example scripts.

Why the Polar Science Community?

It is a time of rapid and uncertain change in the Arctic and Antarctic. As new remote and in-situ observations fill up disk space, Pangeo is uniquely situated to provide the tools required to answer Polar science questions that are more frequently requiring the use of multiple observations and multiple models. One pressing consistent challenge when migrating to a Cloud-based workflow is the lack of Polar relevant datasets in cloud optimized formats, which has likely prevented scientists from adopting Pangeo into their workflow.

In this blog post I will give a brief overview of:

  • new Sea ice datasets available in the Cloud
  • new interactive Jupyter notebooks
  • how we set up the Polar deployment

What new data sets are there?

The below datasets focus on Sea Ice in the Arctic. The first two were created as part of the Sea Ice Prediction Network Phase II (SIPN2), which is a community effort to improve sea ice prediction in the Arctic. They have been converted from their native format (NetCDF or GRIB) to Zarr, a cloud-optimized format. With time we plan to add in-situ observations and additional remote sensing data.

New Data sets:

  • Arctic sea ice concentration (SIC) forecasts from 20 models (Jan. 2018 to present, up to 1 year lead time, 2.6 GB)
  • Arctic sea ice thickness forecasts from 4 models and IceBridge observations (Jan. 2018 to present, up to 1 year lead time, 188 MB)
  • Observations of sea ice concentration (Nasa Team, Bootstrap, Near-Real Time, 5 GB)

New Example Notebook:

A new example notebook (Example_plot_SIPN2_data.ipynb) is now available that plots forecasts and observations of sea ice concentration, probability, and anomalies. With intake, loading in the SIC forecast dataset is as simple as:

ds_sic = intake.Catalog(catalog_url).SIPN2_SIC.to_dask()
Dimensions: (fore_time: 16, init_end: 45, model: 21, x: 304, y: 448)
* fore_time (fore_time) timedelta64[ns] 0 days 7 days ...
* init_end (init_end) datetime64[ns] 2018-01-07 2018-01-21 ...
lat (x, y) float64 dask.array<shape=(304, 448)...
lon (x, y) float64 dask.array<shape=(304, 448)...
* model (model) object 'Observed' 'awispin' ...
* x (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
* y (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
Data variables:
SIP (init_end, model, fore_time, y, x) float64...
anomaly (init_end, model, fore_time, y, x) float64...
mean (init_end, model, fore_time, y, x) float64...

Below is an example plot at a 1-week lead time valid for the second week of August 2018. Large differences can be seen between the forecasts, due to model resolution, number of ensemble members, and methods of sea ice initialization (among other reasons). What makes this dataset unique is that is updated in daily — providing updated large ensemble of forecasts for users in the Arctic, as well as rapid feedback to model developers.

Observed and Modeled probability of sea ice presence over the Arctic. Forecasts are at a 1-week lead time valid for the second week of August 2018. Greyed out models did not have forecasts available for this time period.

The same dataset is also used to generate the below figures at the SIPN2 website.

Arctic Sea ice Extent forecasted and observed (purple line) (source:
Arctic Sea ice concentration (fraction of grid cell with ice) from remote observations and model forecasts (source:

How this was done*:

  1. Clone
  2. Start a new Kubernetes cluster on Google Cloud Platform
  3. Set up a CircleCI job to automatically update the new cluster image whenever updates are made (i.e. below bullets)
  4. Edit the environmental.yaml file to include python packages specific for polar research
  5. Finally, add example notebooks by submitting Pull Requests here.
  6. Go to and sign in with Github

*Plus lots of help from members of Pangeo to help get this running

How to get involved!

  • We are very interested in building this community and would encourage scientists to get in touch with us if they fall into the polar community
  • Request or add new datasets by submitting an Issue or emailing me at (Instructions for uploading new data here!)
  • Provide a new notebook example for you sub-Polar field

Conclusion: has the potential to rapidly advance Polar Sciences by bringing the data, compute power, and (most importantly) the researchers together in one shared platform. At the moment, I am the only one using for research (AGU 2018 Poster). The transition from working on a local machine to the cloud did not happen overnight and I am still 50:50. This transition was relatively painless because 1) my code lives on Github and 2) Xarray makes converting my existing Netcdf datasets to Zarr easy. A remaining challenge is to add non-gridded Polar datasets (e.g. Buoys, upward looking sonar, airborne lidar, IceSat-2 retrievals) in cloud-optimized formats that are useful for all Polar researchers.


A big thank you to Joe Hamman, Derek Ludwig, Chris Marsh, and Ryan Abernathey for help getting this set up and reviewing this post!