Data for the People!

Development Seed partners with NASA and AWS to disseminate Earth Observation data in user-friendly formats.

Aimee Barciauskas
Development Seed
5 min readJun 15, 2020

--

Normalized difference vegetation index (NDVI) from the MODIS instrument over the United States.
Normalized difference vegetation index (NDVI) from the MODIS instrument over the United States.

Traditionally, Earth Observation (EO) data is heterogeneous and complex, and its volumes are unwieldy. These archives make EO data challenging for many potential users. Working with NASA and Amazon Web Service’s (AWS) Public Datasets (PDS) Program, Development Seed generates cloud optimized analysis-ready datasets, broadening accessibility to NASA’s vast and rich EO archives.

A big thanks to the NASA Science Team at Goddard Space Flight Center which supported this effort: Binita KC, Sudipta Sarka and Sadashiva Devadiga.

Why Earth Observation (EO) data?

Human activity causes changes to our planet’s systems, such as changes in atmospheric conditions due to mobility patterns altered by the COVID-19 pandemic. Earth system changes are observable using data collected from satellites and other remote sensing platforms, but parsing signal from EO data has conventionally required access to data archival centers, such as NASA’s DAACs, extensive science background and access to private high-performance computing clusters.

Why cloud-and analysis-optimized data formats?

EO data collection, by data providers such as government agencies (e.g. NASA, NOAA) and commercial satellite companies (e.g. Planet), is changing in 2 important ways:

  1. Scale — EO data volumes are growing at an order of magnitude. The oft-cited NISAR mission will alone produce 140PB of data over its 3 year lifespan, which can be compared with NASA’s EO archive of 22PB total at the beginning of 2017 (source: ASF’s Getting Ready for NISAR). This mission is just one example of an expanding field of data collection from both private and public enterprises.
  2. Heterogeneity — Users are excited by the release of data from new non-optical sensors, such as SAR and LiDAR produced by missions such as BIOMASS and GEDI. Data collected from these missions surmounts previous challenges in conventional optical data sources, such as cloud cover, but present new challenges to adapt existing tools and libraries or create new tools and libraries.

For a thorough explanation of the challenges and opportunities of the evolution of EO data, I highly recommend Jeff de La Beaujardière’s article in EOS magazine: “A Geodata Fabric for the 21st Century”.

New formats empower both new and practiced EO data users!

EO data collection is not cheap. We will not realize a return on this investment unless we come up with new methods of handling large volumes of EO data. There are 2 ways to improve accessibility: producing analysis-ready data products and optimizing for cloud-access.

  1. Analysis-ready datasets reduce the amount of scientific expertise and pre-processing necessary to do analysis. analysis-ready datasets are produced from lower level data products and include structured metadata and have been pre-processed to perform aggregations or apply quality filters.
  2. Optimizing for cloud access means making data available over common network protocols, such as HTTP. Cloud optimized data enables analysis at unprecedented scale by relying on structured metadata to provide windowed reads into the data. Users make requests for just the variables they want and can scope their requests to temporal and spatial extents of interest. Users of cloud-optimized datasets don’t need to download files and by querying for the data of interest, users can minimize data transferred across the network — this is a critical point for places with low network bandwidth.

Enter the Cloud Optimized GeoTIFF

One cloud-and analysis-optimized format which has gained popularity is the Cloud Optimized GeoTIFF (COG). Development Seed works with champions of cloud-optimized data from NASA’s Marshall Space Flight Center (MSFC) IMPACT team, Goddard Space Flight Center (GSFC) and the Amazon Web Services (AWS) Public Datasets program. Through this collaboration, we have delivered 3 new COG datasets to the AWS Public Datasets program.

What datasets did we make available?

You can learn more about these datasets on the NASA announcements page: NASA Datasets Available in Cloud Optimized GeoTIFFs.

While these datasets were produced specifically to support NASA’s Space Apps COVID-19 Challenge these datasets will continue to be publicly available through the AWS Public Datasets Program for at least 6 months. Development Seed is fortunate to collaborate with AWS and NASA. NASA and AWS share our commitment to open data which is manifest in their formal agreement — the Space Act Agreement (SAA) — to support the delivery and maintenance of earth observation data being publicly available on the cloud.

What can you do with this data?

I’ll be the first to admit I’m not a geoscientist so I’ll leave the scientific interpretation to the professionals. But! I can share a few handy tools for working with Cloud Optimized GeoTIFFs and specifically for how to access these data in the cloud. Here’s some links just to name a few:

  • rio-cogeo: Cloud Optimized GeoTIFF creation and validation plugin for rasterio
  • titiler: Development Seed’s new open source Cloud Optimized GeoTIFF tile server
  • cogeo.xyz: A visual map preview for COGs

You can get started with using some of these tools by visiting this repository and clicking the “launch binder” link:

https://github.com/abarciauskas-bgse/explore-cogs

Aerosol Optical Depth (AOD) from MODIS instrument over San Francisco

COGs are powerful but they are not the only analysis-ready or cloud-optimized option for EO data out there. We like COGs because the adoption of this format to support a wide variety of use cases is strong and the community is vibrant. However, we believe in the paradigm of making data more accessible and not in one particular format or tool. We would love to know what you think about our approach so please reach out on twitter to @_aimeeb with feedback or questions!

Data formats for all creatures with opposable thumbs!

Other ways to learn about COGs!

--

--