How the STAC tool ecosystem is growing towards supporting truly open spatial data
The STAC specification is transforming the way data providers and consumers are thinking about how to work with open spatial data. It provides a clear, common language for spatial data that is flexible and developed by an open community. But what is a language without the tools that speak it? The open-source STAC tooling ecosystem is a critical part of making spatial data open and accessible for use. In this post, I’ll talk about why access to that tooling provides is so important, and describe two instances of collaboration I experienced in the recent STAC and Cloud Native Geospatial sprints that, to me, exemplifies the growth of STAC tooling.
When is open data truly open?
The first few months of my Technical Fellowship at Radiant Earth have been focused on helping develop open-source tools in order to support the STAC ecosystem. This has given me a reason to reflect on why open source tooling is so important for this effort, and also how important it is for open data in general.
The phrase “open data” often refers to the license associated with data, indicating that it has an open license, one that gives rights to people for use under relatively few conditions. But truly open data is about more than just make data available for legal use. Access is crucial. Data that is free to use but cannot be accessed is not truly open.
Access is about more than allowing data to be downloaded at some URI, whether it’s free or for a fee. Just because data is available, doesn’t mean it’s accessible. A file without proper metadata makes it inaccessible except for those with the most intimate knowledge of that data. A sea of well-documented data without a way to search for specific information relevant to a user may as well be inaccessible.
Access is about a user's ability to utilize data for their purpose, whatever that may be. And because every user and purpose is unique, openness is not a binary — it’s a spectrum. On one end of that spectrum, data is accessible to a select few, and on the other end, anyone who wants to is able to effectively utilize data. It’s a property that can be measured by the number of people who have real access to the data — meaning they are able to use that data to answer questions and solve problems.
The goal of opening data is to increase the number of people who have real access to data. Hosting properly licensed data is the first step. Ensuring the data is properly documented with STAC enables access to a whole new set of users who’s numbers rise with the development of more and better tooling.
As more datasets are coming online that are defined by STAC, it’s important that the tools surrounding STAC continue to provide better access to that data. Luckily, the STAC tooling ecosystem, already with many useful tools, has momentum, and is growing. I saw that growth on display at the recent STAC and Cloud Native Geospatial sprints, virtually hosted in August and September of 2020. There were participants across the globe developing and using tools that work with STAC. And there was too much progress to mention here — you can read a progress report from the tooling sprint here for a more complete rundown. Instead, I will describe two instances of collaboration I was involved with that to me highlighted the way the STAC tooling community is growing.
A key component of the tooling ecosystems is the base libraries that enable reading, writing, and basic manipulation of STAC. There are several such libraries available now for STAC, including PySTAC and stac-pydantic for Python, stac4s for Scala, and DotNetSTAC for .NET languages. These represent a core of building blocks for other tooling, and the growing maturity of usage of these libraries is an indication that the ecosystem is becoming more cohesive.
I’ve been a core developer on PySTAC, adding features and keeping it up to date with the STAC spec through my work at Azavea and my fellowship. Recently there was functionality added to PySTAC to enable validation or metadata based on the community-maintained set of JSON Schemas — a feature that was highly requested from users so that they could ensure the STACs they were reading and writing with PySTAC were valid according to the spec.
SparkGeo had developed the stac-validator project for a similar purpose, giving a command line interface to STAC catalog validation. At the STAC sprint, I talked with the developers of stac-validator and we determined that a needed update to stac-validator could utilize PySTAC’s new functionality. This new validation functionality is available in the latest release of stac-validator.
Collaboration in action!
Command line access opens the door for a broad audience of users. A fully featured command line interface is needed in order to make working with STAC accessible to users who aren’t as comfortable writing code and scripts that use these base libraries. The sprint hosted discussions about the future of command line utilities for STAC. There was agreement that we need a command line tool that goes beyond just validating STACs, and allow for the creation and manipulation of STACs from the command line. We’ll continue to collaborate towards making that a reality, and I’m excited to see what developments will happen by the next STAC sprint.
Let them be browsed
Also as part of my fellowship I’ve also been working on improving stac-browser, a tool for visualizing STAC Catalogs, Collections, and Items in a web browser. During that time I updated stac-browser to work with the latest version of STAC (v1.0.0-beta.2), which helped multiple users of stac-browser to upgrade their STACs to the latest and greatest. I’ve also been slowly working through bugs and features, like adding collection asset and summary support.
These updates enabled Mattais Mohr to utilize the stac-browser directly in stacindex.org, which is one of my favorite projects to come out of the STAC sprints. STAC Index allows anyone to contribute their openly accessible STAC to the index and generates a stac-browser site for it. It also allows the community to submit descriptions about open source tools to work with STAC. This lets people interested in using STAC in their language of choice, or are looking for different implementations of tools like STAC API servers, go to one place and get help understanding the landscape. It will be great to watch both of these lists grow over time!
The stac-browser and STAC Index projects will continue to develop towards pushing forward a long standing vision for STAC, where each open spatial asset has its own human-readable page with descriptions and visualizations. These pages will eventually allow search engines to index the data, and to let them be queried just like the rest of the web. Providing searchable, browsable pages for all STAC catalogs will connect the vast set of open spatial data in a way that many can consume and understand, unlocking access for anyone with an internet connection and a browser.
For data to be truly open, we need tools that allow data to be utilized to its maximum potential.
The future of open spatial data is one in which integrated, analysis ready data is instantly available to users in a format that allows them to answer specific questions about our planet at any scale. Moving as quickly as possible into this future is the best way we can enable our current and future generations to utilize spatial data in combating the world’s biggest problems, like our changing climate.
We’ve made strides, but are still only at the start of this journey. The STAC community is helping build the foundational roads and bridges supporting a connected global community of data providers, data consumers, and technology builders that will ultimately realize the full potential of open spatial data. It will take a lot of effort and collaboration, but as the recent STAC and Cloud Native Geospatial sprints show, individuals and organizations are willing to contribute — contribute time, expertise, data, compute power, and sponsorship — to help make spatial data more truly open. Many thanks to all of you who contribute to these efforts, and for those who haven’t joined us yet — come pitch in, there’s a lot of work to do and we could use your help.