Xpublish is a new Xarray extension that makes it easy to publish datasets via a Zarr-compatible REST API. You can test drive Xpublish now in this Binder or install it from Conda or PyPi:

$ conda install -c conda-forge xpublish
# or
$ pip install xpublish

Xpublish enables sharing of Xarray datasets via a web application. The data in the Xarray datasets (on the server side) can be backed by Dask to facilitate on-demand computation of derived datasets. The basic usage is as follows:

Server-side: datasets are published using the serve() method on an Xarray Dataset accessor (rest):

>>> ds.rest.serve(host=”0.0.0.0"…


logos
logos

We’re excited to introduce a new Intake driver: Intake-stac. We think this tool will make it much easier to explore SpatioTemporal Asset Catalogs (STAC) and enable interactive data analysis and visualization in a Python environment.

Intake-stac provides Intake drivers that support opening STAC Catalogs, Collections, Items and ItemCollections. By combining Intake and sat-stac, Intake-stac provides a simple toolkit for working with STAC catalogs and for loading STAC assets as Xarray objects. Intake-stac can be installed via pip or conda-forge:

$ pip install intake-stac
# or
$ conda install -c conda-forge intake-stac

STAC

The STAC specification provides a common, machine-readable (JSON) format…


Simulation models continue on their path to the exascale — motivating the search for a new paradigm for model data analysis. We explore the use of Zarr (a data format for chunked, compressed, N-dimensional arrays) and Redis (a streaming database) to support streaming data from simulation models to analysis environments.

logos
logos

Weather and climate simulation models are producing data at an ever increasing rate. These models are traditionally run in large HPC computing facilities and make extensive use of fast parallel file systems to support archiving their model history. As models have continued to grow in size (higher resolution, more simulated…


Headed to the 2019 AGU Fall Meeting in San Francisco in December? Here’s your Pangeo guide to tutorials, presentations, and meetups for this years meeting. See this discourse post for up to date discussion on this year’s AGU happenings.

Tutorials

Pangeo: Hands on with JupyterHub and Open-Source Python Tools for Scalable Analysis of Big Data in the Geosciences (SCIWS12)

Scott Henderson, Amanda Tan Lehr, Joe Hamman, and Jessica Scheick will be running a 4-hour Pangeo Tutorial on Sunday December 8th at 13:40–18:00. Details are below. …


Written by Joe Hamman, Scott Henderson, and Rob Fatland, and Amanda Tan Lehr.

Tldr; Pangeo’s public cloud-hosted JupyterHubs are not enterprise grade platforms. We recommend not storing access keys, passwords, or private data on these hubs.

In previous posts we’ve discussed our design of Pangeo deployments on Kubernetes and some rough estimates of how much it costs to operate one in production. In this post, we provide some thoughts and discussion on the subject of security. We should note that none of us are experts in this area. …


Ideas included in this post are based on numerous conversations that have occurred across the Pangeo community. Ryan Abernathey, Guillaume Eynard Bontemps, Joe Hamman, Chris Holdgraf, Fernando Perez, Niall Robinson, Matthew Rocklin, Richard Signell, and Amanda Tan Lehr (alphabetized) made specific contributions to the development of these thoughts.

We kicked off the Pangeo Project three years ago this week. Our initial meeting was held in New York at Columbia University — there we, a small group of scientists and software developers, gathered to discuss how software projects like Xarray and Dask could provide the foundational elements for a new approach…


Written by Joe Hamman, posted on behalf of the NCAR Science at Scale Team.

The Science at Scale Team at the National Center for Atmospheric Research (NCAR) is excited to announce the release of the Community Earth System Model (CESM) Large Ensemble Numerical Simulation (LENS) dataset published in the Amazon Public Dataset Program (link to dataset). In this blog post, we give a brief overview of 1) the LENS dataset, 2) how you can access the data, and 3) a Binder-ready Jupyter Notebook that reproduces a few key analyses of the LENS dataset — originally presented in the Kay et…


In my last post, we looked into Pangeo’s cloud costs and discussed what it would like to budget for a typical Pangeo research cluster. In this post, I’ll present a technical and opinionated design for a typical Pangeo Kubernetes cluster. I’ll focus on the design features that impact scaling, cost, and scheduling and discuss some recent improvements to JupyterHub, BinderHub and Dask-Kubernetes that were implemented to improve behavior in these areas.

To review, we’re interested in deploying a Kubernetes cluster with this basic node-pool configuration:

  1. Core-pool: This is where we run things like the JupyterHub and other persistent system services…


When we talk to people about Pangeo on the Cloud, we’re often presented with questions about costs? How much does it cost? Who pays for it? What’s the long-term plan for Pangeo on the Cloud?

Before diving into the details, I think it is important that we revisit why we’re so excited about Pangeo on the Cloud. I’ll argue the most compelling aspects of the cloud fall into four categories.

  • Scale: As we move fully into the Big Data era, the cloud offers a compelling combination of large scale storage and compute. …


Satellite observations are fundamental to how we understand the Earth. For example, this composite image shows active wild fires observed from space on August 22, 2018. Learn more here: https://www.nasa.gov/image-feature/goddard/2018/a-world-on-fire

Today, a new project begins that expands Pangeo’s capability to utilize remote sensing datasets for geoscientific research. Our focus will be on cloud-native data analysis tools and approaches with the lofty goal of changing the way every day scientists interact with satellite observations of Earth.

A team of researchers from the University of Washington eScience Institute, in collaboration with scientists and engineers from the National Center for Atmospheric Research, Anaconda, and Element84 have been awarded a $1.5 million grant from the National Aeronautics and Space Administration (NASA) through the Advancing Collaborative Connections for Earth System Science (ACCESS) program.

Project Summary

Data intensive…

Joe Hamman

Tech director at @carbonplan and climate scientist at @NCAR. @xarray_dev / @pangeo_data dev.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store