Creating the Planet’s Digital Ecosystem
How Radiant MLHub Contributes to Global Action Towards a Sustainable Earth
In The Promise and Peril of a Digital Ecosystem for the Planet, authors Jillian Campbell and David Jensen from the United Nations Environment Programme (UNEP) published an urgent call for action to the world: Create a shared vision that leverages new technologies to manage humanity’s footprints or risk perishing as a consequence of the climate and nature crises. The authors expand on the discussion paper, The Case for a Digital Ecosystem for the Environment: Bringing together data, algorithms, and insights for sustainable development, which was authored through a participatory process led by the UN Science Policy-Business Forum.
Given Radiant Earth Foundation’s extensive work in this area, one readily sees the game-changing value of the digital ecosystem blueprint outlined by Campbell and Jensen. Their prevailing call to action involves public and private stakeholders working together to capture this vision for the public good.
The digital ecosystem framework specifically calls for:
- High-quality data that is open (and if we might add, trusted), including a validation process for citizens, metadata standards, and analysis-ready data;
- Baseline infrastructure that allows open access to data and information, and the ability to manage and process the data;
- Public algorithms that can be validated and shared among practitioners; and,
- Real-world applications for more in-depth and real-time insights into persistent problems and co-designed with users and related institutions.
This framework, the authors argue, has the potential to reverse our current trajectory towards an unsustainable future.
Focusing on the mechanisms of data collection and sharing
The high-level requirements listed above are achievable, especially as early results of these frontier digital technologies such as Earth observation (EO) and artificial intelligence (AI) indicate the potential for significant societal changes. That said, the governance framework would greatly benefit from a mechanism by which high-quality data is collected and shared, making the data readily and broadly available to researchers and practitioners worldwide.
Citizen science is an excellent example of this. Organizations such as National Geographic have successfully demonstrated the power of citizen science. The challenge, however, is not just data collection, but it’s quality and the interpretation of the science from these data. In other words, while the infrastructure that is in place for citizen science allows for data collection to scale, the data can be noisy and uncertain. Developing guidelines for data collection and better engagement with citizen scientists is the key to ensure that the data meet specific quality standards and ultimately contributes to science and decision making.
On the flip side, researchers who collect high-quality data don’t often have the direction, incentives, or resources to catalog and document their data and to share it on a permanent repository. Sharing and benchmarking data is key to reproducibility in science. It also increases the return on investment that goes into these data collection efforts, in many cases from federally funded and philanthropic grants.
In addition, efforts must be made to further leverage open data. As the authors discuss in the Nature article, Open is not enough, open data needs to be accompanied by sample code and documentation about data analysis workflows to ensure the usability of the data. When we consider that technologies such as AI and machine learning are nothing without data, the impact of open data sharing in accelerating innovation is even more imperative.
Radiant Earth is actively working with partners throughout the world to strengthen data sharing mechanisms, which are vital to achieving the digital ecosystem for the planet.
Radiant MLHub as a Digital Public Good
Some of the impediments to fully leveraging AI and EO as part of the digital ecosystem for the planet are lack of access to useful, high-quality training data, lack of geo-diversity in the data, and challenges in discovering existing data. These limitations affect all aspects of the proposed digital ecosystem: data, infrastructure, algorithms, and applications.
To address these problems, Radiant Earth established Radiant MLHub, an open-source digital library that allows anyone to discover and access Earth observation training datasets and AI models. Radiant MLHub’s contribution to the proposed digital ecosystem includes generating high-quality and geographically diverse training data, followed by sharing and exposing the data via a standardized API that anyone with connectivity can access.
In the next few months, we are releasing a crop type training dataset for major crops in Africa, followed by a global land cover training dataset. Both datasets are based on multispectral data from the Sentinel-2 mission and include temporal data from Sentinel-2 during the growing season (for crop type dataset) and during the year 2018 (for land cover dataset).
Data are stored on public cloud repositories with a creative commons license (CC BY 4.0). In practice, this means that any organization or individual can use these diverse training datasets to train and validate their algorithm for better accuracy. The process will allow data scientists to test how well their models can predict outcomes. In addition, Radiant MLHub will house these trained ML models for further exposure to users.
To increase reproducibility and benchmarking, all models will include metadata listing information about the training dataset used to develop it, as well as the constraints and limitations of the model.
Finally, Radiant MLHub is designed to be a community repository for discovering AI training data and tools on EO. All interested organizations, practitioners and researchers can register or share their data on Radiant MLHub no matter where they physically host the data. To ensure Radiant MLHub is interoperable, we regularly convene the community and invest in the development of an open community standard, which is called SpatioTemporal Asset Catalog (STAC). Such open standards are essential in the design and usability of infrastructures for public good data.
Practitioners many times ask how we can get policymakers and other decision-makers to trust algorithms. We believe that openness, transparency, and collaboration is paramount for trust to occur.
Extending the call to action
Public good datasets benefit everyone and can have a high social impact. However, sharing data, algorithms, and applications along with building digital infrastructure to achieve the Sustainable Development Goals will also require financial and technical cooperation. As Campbell and Jensen ask: Who will pay for this digital ecosystem?
One way to enable this digital ecosystem is to build on individual and organizational interests and establish a data commons cooperative. Such an initiative would incentivize different sectors of the economy to join and benefit from its services, which could be data, infrastructure, algorithms and/or applications. Most importantly, this could also ensure this public good is financially sustainable.
There is only one planet, but the issues we face as the Earth’s inhabitants are many. As scientists and technologists, we play a special role in studying our world, identifying changes, and providing insights for possible solutions. This is why we do what we do. Working together, we will not only openly share valuable data products, algorithms, applications, and training data, we will capture the vision of a digital ecosystem for the planet and open the door to a more sustainable future.