Radiant.Earth Launches New Technical Working Group on Machine Learning for Global Development

By Hamed Alemohammad, Lead Geospatial Data Scientist, Radiant.Earth

The new Technical Working Group on Machine Learning for Global Development, representing 23 institutions from various sectors.

As part of a recent Radiant.Earth workshop, 30 leading international experts participated in the launch of a new Technical Working Group on Machine Learning for Global Development.

The group includes Earth observations (EO), machine learning (ML), and land cover (LC) classification experts, all working collaboratively towards the goal of developing a community standard on best practices for use of ML with EO, a commons for labeled training data catalogues, and a hierarchical schema for global LC classification.

“Radiant.Earth is developing open source datasets of labeled satellite images, which will be hosted on the MLHub.Earth with a Creative Commons license.”

Radiant.Earth is developing open source datasets of labeled satellite images, which will be hosted on the MLHub.Earth with a Creative Commons license. These datasets will lead to a living open image library for ML and EO. Our goal is to create a sustained, community-wide effort to capture image labels that would enable major innovations and will drive new, more targeted and timely insights supporting progress in areas such as agriculture, food security, conservation, health, land rights, urban planning, water resources, and other areas relevant to global development and humanitarian response.

The first of such datasets that Radiant.Earth will generate consist of global LC labeled imagery from Sentinel-2 satellites at 10 m spatial resolution. This will enable fully-automated and dynamic LC classification algorithms, using open source satellite imagery. Radiant.Earth will label these images using a combination of ML and crowdsourcing to generate a human-verified training dataset.

“Existing training datasets for LC classification have limitations that do not support development of a global EO-based LC classification algorithm at fine spatial resolutions with high accuracy.”

Existing training datasets for LC classification have limitations that do not support development of a global EO-based LC classification algorithm at fine spatial resolutions with high accuracy. These datasets are either generated for specific regions of the world (therefore, they lack geo-diversity) or are based on imagery that are not freely available at the global scale (therefore, they are not open source). Moreover, in many cases, very few labeled images are available for a specific class within the dataset, which limits the performance of a ML algorithm to learn the particular features of that class.

Budhendra Bhaduri, Corporate Research Fellow at Oak Ridge National Lab, shares his perspective on using machine learning and high performance computing for LC classification.

Key topics of the Technical Working Group

Radiant.Earth formed the technical working group on Machine Learning for Global Development to best define the specification of such a global dataset to meet the requirements for end-user applications and to standardize best practices to increase the interoperability of different datasets and algorithms. The group members are experts from commercial, government, non-profit and academic organizations with subject matter knowledge related to this topic. Existing and future activities of the group are documented on this GitHub repository.

The first meeting of the working group focused on the topic of “Machine Learning for Global Land Cover Classification,” on June 14–15, 2018 in Washington, D.C. Thirty experts representing 23 institutions gathered and presented their latest advancements in the use of ML for LC classification. Presenters also shared their thoughts on the challenges and remaining barriers to improve the accuracy of global LC maps. To facilitate further discussions and examination of key topics, experts participated in one of three groups, which are summarized below:

Group 1 focused on developing a hierarchical LC schema to include all major LC classes at global scale and enable inter-comparison and cross-validation of different LC products that use satellite imagery at different spatial resolutions. Highlighting the importance of distinguishing between LC and land use, the group developed a hierarchical LC schema combined with a set of attributes which is translatable so that refined details can be added in each class later on. The schema is designed for a global LC product and assumes that the LC definitions will be updated annually. Details of the schema are provided in here.

Group 2 reviewed the challenges and ad-hoc choices for using ML with EO data. After two days of discussions, they generated a set of best practices for this application. Their recommendations are focused on four topics: (1) accuracy of training data labels, (2) achieving higher accuracies within and between LC classes, (3) maintaining labeled training datasets and (4) best practices for a global LC algorithm using Sentinel-2 imagery. Their detailed recommendations are included in the notes from the meeting (available here), and covers all aspects of these four topics.

Group 3 examined current standards in storing and distributing labeled satellite imagery and the caveats related to each of them. They also developed a training data architecture using the Spatio-Temporal Asset Catalogue (STAC) specifications. This training data specification enables combining raw imagery and label information in one standard catalogue that is adaptable to a wide range of labeled imagery. It will accelerate adoption and use of these data in ML algorithms. The label asset in the catalogue allows for the labels to be “tile classification,” “object detection,” or “segmentation of pixels.” The draft version of this spec is published in this GitHub repository along with a sample GeoJSON file from the SpaceNet challenge.

Notes from the group discussions are also available here.

White Paper on Machine Learning for Global Land Cover Classification

The results of discussions from all three groups, currently being synthesized and documented, will be published as a white paper. Radiant.Earth is also working with other groups that have generated or are generating labeled imagery for implementation of these specifications and standards. This is an ongoing effort and the specifications will evolve in the next couple of months to reach an adoptable and operational level. We believe in collaborative innovations and will invest in similar gatherings to facilitate adoption and dissemination of ML techniques applied to EO in the future.

Finally, I would like to thank Schmidt Futures for sponsoring this project and workshop, as well as our wonderful Radiant.Earth team that worked tirelessly to make this workshop a great success.