Cloud Native Geospatial Outreach Day Recap
It’s been just over three weeks since the Cloud Native Geospatial Outreach Day. Everyone I’ve talked to felt it was an incredible event, and I definitely concur. Thankfully we managed to record almost all of it, so if you missed it you can still catch the content on youtube!
I wanted to give a recap of the event and share some of my favorite parts.
Welcome & Overview
We opened with a welcome from Bruno Sánchez-Andrade Nuño and me, representing the Microsoft and Planet, the convening sponsors. Then Hamed, the new Executive Director of Radiant Earth, introduced the Data Labeling Contest (which was a great success). I then attempted to give an overview of ‘Cloud Native Geospatial’, to help explain it to new people and to chart our progress. It’s always fun to pull these together, as each time I get to survey all the new things happening, and there seems to always be great new stuff. This time I also tried to explain more of the ‘why’. I don’t think I made the strongest argument, but I hope to try again and go deeper in a blog post.
The two sessions of lightning talks truly demonstrated how the Cloud Native Geospatial movement is crystalizing in a big way. When I put out the call for lightning talks I knew there was some cool content out there, but I was blown away by the diversity and sheer awesomeness that came in. If you are to watch any of the recorded videos from outreach day then Lightning Talks Round 1 and Round 2 are definitely the place to start.
We heard how a number of different organization and datasets are embracing STAC & COG, from commercial satellite providers like Maxar and Planet to startups like Arturo, SparkGeo, and Astraea, to the huge public data catalogs of ESA in their FedEO Portal, NASA in their Common Metadata Repository and particular open data sets like CBERS and Sentinel 5P (those links all go direct to the lightning talks). And all three major public cloud providers are moving towards the formats — AWS shared all about their Registery of Open Data, Google showed how Earth Engine can read and write COG’s, their STAC interface to their Earth Engine Data Catalog, and their plans to embrace STAC more. And though Microsoft hasn’t embraced it quite as quickly they have the potential to be a real emerging force in providing Cloud Native Geospatial datasets, as part of their AI for Earth and Planetary Computer initiatives. Their Azure Open Datasets are already quite interesting, and they are building a team that looks set to really embrace COG and STAC as a differentiator. And I always love those who differentiate by being the best at open standards.
I was also really excited to see two international development organizations, the World Bank and Digital Earth Africa, join the movement. Both sponsored the event, but more importantly, both are seeing the potential of this new approach. Years ago I did a fellowship in Zambia, so it is now always in the back of my mind how the technology I build can work in less mature technological environments. Storing data as COG and STAC on the cloud takes much less ‘capacity building’ than previous geospatial approaches, cloud storage regions in Africa let the data live close to its users and moving the compute to the cloud obviates the need to download massive datasets on slow connections to do analysis. So it was awesome to learn about all the Digital Earth Africa from Fang Yuan — they’ve actually put up the first fully Cloud Native Geospatial Sentinel 2 dataset, converting the level 2 surface reflectance products from JPEG2000 to Cloud Optimized GeoTIFF and putting a STAC interface on top of it, stored in the Africa AWS region.
Unfortunately, the timing didn’t work for the World Bank to give a lightning talk, but they’ve been working on a release of NOAA nighttime lights archive in COG & STAC, and their Open Cities AI challenge is a great example of using the Label extension for STAC.
Anyone can get a browser interface to their STAC catalog in seconds by just submitting the public URL, and the index should grow as the central place to go to find interesting data in STAC. And it also lists all the tools available in the ecosystem. The lightning talks also highlighted more STAC and COG tools, like DotNetStac, GDAL’s COG support, Intake-STAC, and ESRI crawling STAC with their GeoPortal Harvestor.
I loved Robin’s talk on how STAC is being applied to Planetary datasets at the USGS’s astrogeology science center. My favorite moment in the evolution of an open project is when the work is used in a way that you never even dreamed of. I’ve had lots of big aspirations for STAC & COG, but it didn’t occur to me that they would be useful for work with Mars or Venus. So seeing her talk was definitely that moment for me, and it feels like many more are likely to come with STAC.
There were a couple of other datasets presented that similarly pushed beyond STAC’s core use case of satellite imagery. Perhaps my favorite talk title was Radio Occultation & STAC: a match made in the ionosphere, describing Development Seed’s work for NASA to work with data from Spire and how they managed to bring it into STAC, even though it didn’t fit quite as seamlessly as other data. And then it was awesome to hear from Pixel8 about their work on point clouds and STAC. They have a really compelling vision to take terrestrial point clouds captured from camera phone photos and combine them with overhead reference data to create a single harmonized model of the world that is more accurate than either one alone.
One of the big reasons we expanded from just STAC to be Cloud Native Geospatial was to include Cloud Optimized Geotiff (COG) as well as other emerging formats. COG demos tend to be a bit more visual and flashy than STAC demos, and a few of the presenters delivered in spades.
Fabian from EOX showed COG Explorer, which has been one of my favorite projects for awhile — proving that browsers can talk directly to COG’s, with no tile server needed. He also showed more advanced visualization of COG’s for real analysis of CORINE Landcover. And then Daniel explored even more visualization of COG’s directly in the browser. Duck from Planet shared a slightly different approach, with a COG tile server that sends full band and full bit-depth information to the browser.
He actually took an approach similar to Duck’s, using a tile server on top of COG’s. They used slightly different approaches, but are now actively working together to get an interoperable format.
We also heard about the new emerging Cloud Native Geospatial formats. Zarr excels at multidimensional data, a cloud native format for NetCDF type data. Anderson gave a good overview, Aimee showed off some interesting public Zarr data, and then she also gave a deeper dive into zarr in her intro session. And then Norman shared all about TileDB in his lightning talk and intro session. TileDB is a great new ‘universal data engine’, but the core of it is an array format that is truly cloud-native. And Javi from Carto gave a great articulation of the potential for cloud-based data warehouses (think Snowflake or BigQuery) to transform our industry, with a pitch for using a geospatial Avro for a cloud native vector format, as they’ve done in their new data observatory.
One of our main goals for the event was to try to expand our community, as Cloud Native Geospatial has been rapidly maturing and it’s time to bring more people in. And we wanted to welcome not just people who are new to STAC and COG, but also those who may be new to geospatial. The geospatial world can still be opaque for new people, and a flurry of talks is not the best format to really understand. So we decided to make ‘intro sessions’, that would be up to 40 minutes long, with enough time to take things slow and encourage questions and learning. My favorites of these were aimed at true beginners, aiming to explain many of the core concepts that experienced geospatial practitioners take for granted.
For those who are new to geospatial, I’d recommend starting with a pair of talks from a couple of awesome Planeteers: Sara’s ‘Intro to Geospatial Raster Data’ and Ash’s ‘Packing Your Geospatial Data Science Toolkit’. The first gives a great overview of what geospatial is all about, and the second provides a general approach for how to tackle (geospatial) problems and presents an array of great tools to help you do so. From there, the ‘Machine Learning and Satellite Imagery overview’ from Dave introduces one of the most interesting new trends in geospatial, explaining what ‘Machine Learning’ is and how it applies to imagery. And Data Labeling with Groundwork by Joe and Niki was focused on the Data Labeling Contest that was run, but provides a great introduction to a really interesting tool that is used to create ‘labeled data’ that powers the type of machine learning Dave talks about.
And the final set of truly introductory talks brings people in deeper to some awesome tools. The QGIS session shows how to use the leading open source desktop GIS tool with STAC, and the Sentinel Hub session introduced their awesome cloud GIS, and showed how it works with Cloud Optimized GeoTIFF’s. Alex also shared a practical introduction to a variety of tools working with Digital Earth Africa’s COG+STAC data.
Other intro sessions topics included Radiant Earth MLHub, Intake-STAC, PySTAC, Cirrus, TiTiler & Arturo STAC, Franklin, Multi-Scale Ultra High Resolution (MUR) Sea Surface Temperature (SST) Zarr. Matthias and I each hosted intro + Q&A sessions on STAC (his is better than mine). And Phil did a really great introduction to STAC API, using the Astraea STAC implementation.
The Future of Geospatial is Cloud Native
To me, the most inspiring aspect of the day was seeing how all these individually cool projects all come together into a real movement. I don’t think doing geospatial natively on the cloud is in any way ‘new’, but I believe there will be a tipping point when geospatial is done primarily on the cloud. To me, the outreach day demonstrated that we are fast approaching the tipping point. The next step is to get the majority of the world’s geospatial data in COG, STAC, and other new cloud native formats, and to shift everyone to a norm of first publishing geospatial data to the cloud. It should open up a new level of global analysis that can actually handle this tsunami of data being generated, at a critical juncture for humanity.
I’ve still got to write up all that happened in the sprint portion of the event and to recognize the best contributions. So look for that soon. And if you’d like to see another Cloud Native Geospatial Outreach event happen and are up to help out please get in touch. I’d love to help make it happen and be a part of it, but I think next time I need to distribute the load so I don’t have two weeks of my life (including part of my vacation) taken over by organizing.
But this outreach event and the overall sprint were a great step forward for this emerging movement, so it was definitely worth it. It was great to make it about more than just ‘STAC’ and recognize all the related pieces coming together. I think we can be confident that STAC is ready to release 1.0.0 soon (with a few more minor evolutions), which was my personal goal for this event. We remain a mostly volunteer effort, so if you’re interested in helping out (with time or funds) don’t hesitate to get in touch.
And thanks once again to everyone who presented, the event would truly be nothing without you.