Geospatial Models Now Available in Radiant MLHub

The models include metadata based on the STAC ML Model Extension to enable easy sharing and retrieval.

Radiant Earth
Radiant Earth Insights
4 min readDec 16, 2021

--

Radiant MLHub has been the source for high-quality open geospatial training data for use with machine learning (ML) algorithms since 2019. Today, we’re excited to announce the addition of a model repository allowing Radiant MLHub users access to both geospatial training data and ML models. The geospatial models catalog includes metadata that describes training data associated with a model and its architecture for training a model to generate predictions.

The first model available for discovery can estimate tropical wind speed storms using satellite data. Users can gain access to it using the Radiant MLHub API. Radiant Earth will add other models to the repository in the following months. Trained ML models may be listed, fetched by ID if known, or queried using standard SpatioTemporal Asset Catalog (STAC) API search methods.

The metadata describing geospatial models are based on the STAC ML Model Extension, which provides a way to catalog ML models that apply EO data using a STAC label extension.

The first model available in the repository is based on the Tropical Cyclone Wind Estimation Competition training dataset, created to produce targeted wind speed estimates of tropical storms.

Why the STAC ML Model Extension

Similar to how the STAC specification makes geospatial assets more openly searchable, the STAC ML Model Extension standardizes information on models that use geospatial data, making them more discoverable. STAC, and the related Label extension, provide a mechanism for cataloging geospatial ML training data. In addition, the STAC specification already defines various metadata and conventions for describing Earth observation data. It also has powerful search capabilities in the STAC API. Thus, it made sense to structure the ML Model specification as a STAC Extension instead of a different standard.

The geospatial metadata properties are critical when reproducing ML experiments or building upon previous work. Imagine, for example, a data scientist from an environmental nonprofit organization interested in finding a model that can identify tree species. With the STAC ML Model Extension, this person can search the model by its architecture and data it requires to apply it using their data for prediction in a specific area. This workflow would save the individual and organization time and money.

Radiant Earth Foundation initiated the development of this specification through support from a cooperative agreement with NASA’s Advancing Collaborative Connections for Earth System Science (ACCESS) Program. Following initial conversations with multiple collaborators and partners, we published a call for interest in the community. In September, a convening of interested developers and practitioners discussed making real-world geospatial models more discoverable and reproducible. One of the things that emerged from the in-depth discussions and perspectives was the need to structure the ML model specification as a STAC extension. The participants represented nine institutions across two continents and have built tools to run inferences and visualize predictions on topical sustainable development issues.

Radiant Earth Foundation has taken the lead to implement the STAC ML Model Extension specification to empower ML practitioners and data scientists with benchmark baselines models that they can apply to evaluate performance metrics. But this is a basic version, and more work is needed to turn the STAC ML Model Extension into a stable standard. Therefore, we invite interested developers to catalog their models using the specification and provide feedback in this open STAC ML repository.

The STAC ML Model Extension In-Depth

The STAC ML Model Extension captures essential metadata related to the training data and environment used when developing the model. These include the learning approach (e.g., supervised, semi-supervised, etc.), the type of prediction the model makes (e.g., classification, regression, etc.), the model architecture (e.g., Region Based Convolutional Neural Networks (RCNN) ), and describing the training environment (e.g., the type of operating the model trained on).

There are other use cases that the STAC ML workflow can support. But ultimately, the end goal of this work is to enable geospatial ML models to be easily discovered and used or re-used by individuals and organizations. This practice could be a game-changer for global development and humanitarian organizations that might lack incentives to scale technological approaches or funds to create advanced models,

What’s next for the STAC ML Model Extension

STAC ML is an open extension and invites all community members to contribute cataloged models or run tests. The contributing guide provides information on submitting input.

We also invite you to join the “geo-ml-model-catalog” channel on the Radiant MLHub’s Slack for discussions or crowdsource ideas.

ML practitioners and data scientists should make their data and codes publicly available for the STAC ML Model Extension to become standard practice. Making training data and models publicly available will enable the search and discovery of geospatial ML models and facilitate the development of more accurate and complex models. However, and perhaps more importantly, it would expose researchers’ work, leading to collaborations and scaling of projects.

--

--

Radiant Earth
Radiant Earth Insights

Increasing shared understanding of our world by expanding access to geospatial data and machine learning models.