Enabling Agricultural Dataflows in Radiant MLHub for Geospatial Machine Learning Analytics

How Radiant MLHub strengthens the data collection to analytics pipeline for agriculture projects.

Radiant Earth
Radiant Earth Insights
5 min readApr 25, 2022

--

By Hamed Alemohammad, Executive Director and Chief Data Scientist at Radiant Earth Foundation

Radiant Earth Foundation is strengthening geospatial machine learning (ML) workflows for organizations working on agriculture projects by streamlining the process, from ground reference data collection to insight analytics, through Radiant MLHub.

Radiant MLHub is an open-access library dedicated to geospatial training data and ML models. Since its inception in 2019, Radiant has focused on developing and aggregating geo-diverse benchmark data that practitioners can use to create applications and enable data-driven policies that impact lives worldwide. The datasets vary from aerial and satellite imagery with labels for crop and land cover types, roads and buildings, clouds, marine debris, and floods. Introduced last year, the geospatial ML models library contains metadata describing their architecture, geographical coverage, and system requirements to use the model. Both datasets and models are accessible through the Radiant MLHub website and a SpatioTemporal Asset Catalog (STAC) API and Python client.

Geographical distribution of the existing 33 benchmark training datasets available on Radiant MLHub.

Almost half of the thirty-three training datasets on Radiant MLHub are suitable for agriculture-based applications. It is a milestone that commenced with the publication of Radiant’s first training datasets for primary crops in Kenya, Tanzania, and Uganda. We deliberately chose to launch Radiant MLHub with these three datasets to signify our commitment to improving the geo-diversity of these data and advancing sustainable development goals like zero hunger through EO and ML technologies. Compiling these crop type training datasets also taught important lessons, mainly about the limitations of ground data collections: Practitioners found it challenging to repurpose existing data collection efforts to create a geospatial training dataset. As a result, Radiant published a best practices guide for Ground Reference Data Collection and Cataloging in 2020 to encourage the community to capture the correct in-field data components fit for ML modeling. This guide is a recommended source for new data collection activities.

To further support members of the global EO, ML, and agricultural community, we began working with organizations that collect or fund the collection of agricultural data. Our goal is to facilitate ground reference data collection workflows for easy publication and sharing, particularly on Radiant MLHub.

The following sections expand on these data collection integrations with Radiant MLHub.

Publishing Training Data on Radiant MLHub using the Enabling Crop Analytics at Scale (ECAAS) Field Mapper Toolkit

Radiant developed a data collection toolkit, Enabling Crop Analytics at Scale Field Mapper, in partnership with Tetra Tech and Open Data Kit (ODK), a mobile data collection platform. The toolkit ensures that collected data are high quality, accurate, and have the proper metadata to enable ML models. Essentially, this form standardizes how ground referenced data are collected and published.

The toolkit empowers ODK users to collect crop data in a standard compliant with the SpatioTemporal Asset Catalog (STAC) specification. Users can also choose to publicly share the data with the broader community by submitting it to Radiant MLHub. Publishing training data on Radiant MLHub allows practitioners to search, discover, and access available training data quickly.

Various humanitarian and global development organizations use ODK to collect data. Therefore, working with them to integrate the data collection toolkit with Radiant MLHub establishes a user pool that can accelerate the adoption of digital and data-driven innovation for small-scale farms, especially satellite data for agriculture analytics with ML technologies.

ODK is the most widely used platform for data collection. However, some other organizations work with other platforms that are built on the foundation of ODK, such as SurveyCTO. We are currently working on joint projects with IDinsight, the Government of Vanuatu’s National Statistics Office (VNSO), and the Helmets Labeling Crops project which uses SurveyCTO, with the aim to develop workflows for the publication of SurveyCTO data on Radiant MLHub.

Publishing Training Data on Radiant MLHub using SurveyCTO

Radiant is applying the lessons learned to build the ECAAS toolkit to design the crop mapping survey form on SurveyCTO for data collection in India, supporting our work with IDinsight. Built upon ODK, the SurveyCTO form considers the constraints and characteristics of local data collection to analytics pipelines for agricultural projects.

IDinsight collects data on crop types and field boundaries in four Indian States using SurveyCTO. Our team will use these data to generate a training dataset and ingest it for publication on Radiant MLHub.

In Vanuatu, Radiant is working with the National Statistics Office to produce crop type maps at scale. This work includes designing a ground reference data collection form with SurveyCTO that the VNSO team lead and partners in-country will use to collect in-field agriculture data with the proper metadata and properties.

With the Helmets Labeling Crops project, Radiant supports multiple organizations working to create labeled datasets covering Kenya, Mali, Rwanda, Tanzania, ​and ​Uganda, the primary food production counties in East Africa. The team is developing rapid in-field agricultural data collection with “Helmets.” In at least one country, the surveyors collect crop type labels using SurveyCTO and ODK. The forms will include the necessary metadata allowing for an ML-ready dataset to be published on Radiant MLHub.

Integrating ODK and SurveyCTO with Radiant MLHub establishes a seamless pipeline for data collection and publication, which is critical for its uptake to increase its value. Our goal is to reduce the time and efforts that practitioners need to spend on data preparation so they can build and improve analytical applications to support data-driven policy making.

For updates on these projects, subscribe to our quarterly newsletter.

--

--

Radiant Earth
Radiant Earth Insights

Increasing shared understanding of our world by expanding access to geospatial data and machine learning models.