Radiant MLHub Spotlight Q&A: Macroecology and Society Lab

Building application-ready tools and data for policymakers, resource managers, and other scientists to understand global dynamics in human-environment systems.

Radiant Earth
Radiant Earth Insights
10 min readJul 13, 2021

--

Our Community Voices for this quarter are Dr. Carsten Meyer, Mr. Ruben Remelgado, Dr. Steffen Ehrmann, and Ms. Caterina Barrasso from the German Centre for Integrative Biodiversity Research (iDiv) Macroecology and Society Lab. They are working on several projects to detect and understand global dynamics in human-environment systems, focusing on human land use, its underlying societal drivers, and its ecological consequences. The research team uses numerous datasets from Radiant MLHub to model crop suitability layers, which will inform the systematic downscaling of crop statistics into pixel-scale crop type classifications.

Dr. Carsten Meyer, the research group lead since 2016, is an interdisciplinary researcher working at the interface of geography, ecology, and informatics. In 2015, he completed his Ph.D. in “Biodiversity, Macroecology, and Biogeography” at the University of Göttingen in Germany, followed by Postdoctoral studies at iDiv’s Synthesis Center and the Ecology and Evolutionary Biology Department at Yale University in the United States.

Mr. Ruben Remelgado, the team’s Geo-computation specialist, develops tools and workflows to map and describe global biodiversity patterns using high-performance computing and large, multi-temporal and multi-resolution data. He has a masters degree in geography from the University of Lisbon.

Dr. Steffen Ehrmann is a postdoc at the Macroecology and Society Lab. He obtained his Ph.D. in “Biodiversity and Ecosystem Services” at the University of Freiburg in Germany in 2017. He explores how we manage ecosystem and landscape influence benefits and disservices nature holds for human health and well-being.

Ms. Caterina Barrasso is a doctoral researcher. For her studies, she is looking at curating global land use data for better quality, in the long-term, of final products. She holds a masters of science in Crop Science from the Wageningen University in the Netherlands.

In this Q&A, Carsten, Caterina, Ruben, and Steffen talk to us about developing application-ready tools for policymakers and resource managers to support decision making.

The Macroecology and Society Lab research team. Steffen Ehrmann (top left), Carsten Mayer (top right), Ruben Remelgado (bottom left), and Caterina Barrasso (bottom right).

Tell us how this research lab got started? What was the motivation behind it, and what are you hoping to achieve with your research projects?

Carsten: I had long felt uneasy about our current inability as scientists to offer sufficiently precise and reliable statements to policymakers about how the world changes and how different policies may influence its sustainability. A major problem, here, is that different components of the integrated world models that we can currently use are based on very limited data and poorly supported assumptions. Accordingly, different models often tell us very different stories. While this is expected from models that are meant to explore alternative future scenarios, what’s really worrying is that the greatest source of disagreement is not the “design” differences between the scenarios, but instead the tremendous uncertainties in the input data with which the models are trained, and in the theoretical assumptions that the models make about how the global human-environment system works. So, I was motivated to contribute to the next generation of more useful and trustworthy models by improving their historical data basis and by using those data to systematically test which theoretical expectations hold where, when, and why. Of course, this is a much bigger ambition than what a single research team or project could achieve. But I was very fortunate to receive generous funding from the Volkswagen Foundation (unrelated to the company, BTW) and additional support from the German Ministry of Education and Science and from iDiv, which allowed me to build up a lab with a longer-term vision and to dig deeper into different parts of this larger problem.

“In recent years, many global maps have been produced. The quality of each map is checked with the use of validation data. However, since there is no standard data collection protocol, nor is this data consistent through space and time, validation metrics are unfortunately inconsistent.” — Ms. Caterina Barrasso, Doctoral Researcher

You work with training datasets available on Radiant MLHub to build your applications. Which specific training data have you used, and for what purpose(s)?

Breaking the world into 71 ecosystem types.

Ruben: I am mapping the global extent of habitats over 27 years, and human-led environmental change is an important driver of changes in habitat extent. Over the past decades, humankind expanded its influence over the planet, increasing agriculture and infrastructure, which compete with natural habitats. As human habitats increase, natural habitats decrease or become degraded. Although many species profit from this expansion, such as insects feeding on crops, many more are under threat. In this context, it is essential to accurately map the extent of man-made habitats. Here, ground-truth data on land use, such as those published by Radiant Earth (e.g., datasets from PlantVillage, Dalberg Data Insights), help us measure mapping accuracies.

Steffen: In the LUCKINet project, our aim is to map various dimensions of land use (with a focus on crop, livestock, and forest types) over a period of the last ~25 years globally. The resulting maps aim at supporting scientists of a wide range of disciplines in their applications in the environmental and socio-economic domains. One important meta goal is thus to enable communication and collaboration based on an interoperable basis, which is crucial to tackling, for instance, the Sustainable Development Goals. We make use of as many and as detailed data as possible that contain some sort of indication of crop types or more coarse land-use classes. Especially in data-scarce regions, the data provided by Radiant Earth provide critical information to refine our models of suitability for each land use and crop type, which is an important intermediate step in our modeling pipeline.

Caterina: In my project, I am working with Radiant MLHub’s training data in regions such as West Africa, East Africa, and Central Asia for which limited data is available. I am interested in understanding uncertainties in global land-use/land-cover maps derived from remote sensing imagery. In recent years, many global maps have been produced. The quality of each map is checked with the use of validation data. However, since there is no standard data collection protocol, nor is this data consistent through space and time, validation metrics are unfortunately inconsistent. Understanding how maps and validation data disagree in space and time is, thus, of great importance. Not only for us, scientists, who use these maps in our applications, but also for policymakers who need to make decisions based on the multitude of global maps available. I am, therefore, using Radiant MLHub’s training datasets to build a validation database that I will then use to quantify uncertainty and improve the reliability of global land-use/land-cover maps.

There are many challenges with building AI applications using Earth observation data such as (lack of) diversity, bias in data, and the ability to scale research applications to real world solutions. What challenges have you faced when you developed your applications?

Ruben: Acquiring ground-truth data is a big challenge. Geospatial applications such as the ones we develop at the MAS lab require a vast amount of ground-truth data to understand, predict and validate the environmental characteristics we aim to map. Given we mainly work on a global scale, these data need more than a high number of records. They require a vast spatial and temporal coverage for multiple variables. Here, collating samples from different sources is a common path we take to satisfy our projects. This way, we profit from the knowledge of local professionals while avoiding the astronomic costs of self-organized field campaigns. However, finding and acquiring these data can be difficult.

While many studies use ground-truth data for spatial applications, few provide them or fail to do so in the long-run due to a lack of funding. Adding to this issue, collecting samples from past field-campaigns can be a sensitive matter. Field experts have a valuable perspective on the variables they sample. Still, as global geospatial applications become the norm, the value of local experience can be neglected. Consequently, field experts can become protective of their efforts, and hesitate to share them under the fear of not profiting from the long-term benefits that come with the applications that depend on their data, such as new funding. These are critical aspects, as much of the knowledge collected is often lost or inaccessible.

Caterina: The biggest challenge I am facing in my project is integrating training data coming from different data providers. We want to produce a global training database to validate world land-use/land-cover maps. Unfortunately, there is no globally accepted standard on how data should be collected and organized. Therefore, the datasets we work with have different structures and metadata attached, which makes collating and standardizing them challenging.

Spatial distribution of samples acquired for the validation of land cover maps.

Steffen: Adding to this, I would mention in particular a lack of some commonly accepted framework, where stakeholders of those standards can influence their evolution. As new actors in this field trying to achieve something that hasn’t been tried before at this scale, we recognized that despite vast efforts by the FAO to harmonize concepts of land-use and crop commodities, most repositories of related data don’t (entirely) follow these concepts. Obviously, it needs a governing body such as the FAO that oversees best practices, however, these best practices also need to be able to evolve and dynamically adapt to fit the smaller scale requirements of (research) projects or institutions, without invalidating all previous efforts. Otherwise, without this framework, what Caterina mentioned will become necessary over and over again, when new actors run into challenges that become only obvious when contemplating a problem from an angle that hasn’t been taken before.

“…land-use activities are disproportionately displacing some very specific types of ecosystems, including some that are tremendously important for global biodiversity and climate protection, such as lowland tropical forests, while in parallel, climate change is making many parts of the world wetter.” — Dr. Carsten Meyer, Research Group Lead

You mentioned in previous communications that each research project feeds into a larger understanding of how humans interact with their environment, and what the driving forces are that lead to ecological degradation. What is your preliminary research telling you? Can you share some insights?

Carsten: We’re working on many different aspects of this broad question, but I can maybe give you one example from our research on global changes in the World’s ecosystems. Here, the current paradigm has been that the widely observed increases in the areas of artificial systems like croplands, pastures, plantations, or cities are causing natural ecosystems like forests, savannas, or wetlands to decline systematically (meaning that most natural ecosystem types are supposedly declining in most regions). Our global analyses of area changes of more than 70 different ecosystem types over the past decades now paint a much more nuanced picture. Rather than finding systematic losses, we find that there are both winners and losers among the natural ecosystems. This is mostly because land-use activities are disproportionately displacing some very specific types of ecosystems, including some that are tremendously important for global biodiversity and climate protection, such as lowland tropical forests, while in parallel, climate change is making many parts of the world wetter. These regionally wetter climates lead to large decreases in deserts, which make room for certain grassland ecosystems, as well as creating many new wetland ecosystems in parts of the world where we simply had not looked until now. Of course, these insights do not imply that those globally increasing grasslands and wetland ecosystems are not worthy of being protected from human activities. But they can help inform policies about which ecosystems should be prioritized in our protection efforts.

Many individuals and organizations such as yourself are turning to open-access data repositories like Radiant MLHub to obtain high-quality training data. This allows them to develop and deploy AI applications faster and more efficiently. What specific benefits are you receiving using Radiant MLHub?

Ruben: Science profits a lot from public funding, and therefore has a responsibility to share its spoils. Scientists do a lot in this direction, providing knowledge for public benefits, and much of this knowledge comes in the form of data. However, these data are often threatened by the limited funding of scientific projects. Here, open-data ecosystems such as Radiant Earth, which has a long-term perspective, are extremely important to preserve data. Additionally, I think open-data ecosystems are an important way to promote the backstage work of field experts. Making data discoverable and properly accredited shines a light on the foundation of many high-impact scientific works, which is just as important as the final application.

Steffen: I value not only the long-term perspective, but also the potential to implement semantic and data organization standards, for instance when allowing users to upload their data. In my view, this improves not only the accessibility of the data, but also the quality of any models built therefrom.

“Initiatives like Radiant MLHub are a great way to democratize access to remote sensing to help those communities most affected by issues such as climate change circumvent hardware and technical limitations and build solutions that serve their local needs.” Ruben Remelgado, Geo-computation Specialist

You have registered your “Crop type dataset for consistent land cover classification in Central Asia” training dataset on Radiant MLHub, which makes the data available for anyone to use in a standard format. Why is it important to contribute to an open data ecosystem? What are the advantages of making data publicly available and discoverable?

Ruben: We developed this dataset under the project Central Asia Waters (CAWa), which aimed to share knowledge with countries of Central Asia. We trained local professionals on using remote sensing to monitor water scarcity and food security, helping them develop tools to monitor the persistent threat of climate change. One of my biggest takeaways from this experience is how the communities that most need data are often the ones without the technical or technological capabilities to produce and use them. With this in mind, I turned to Radiant Earth. I was taken by your mission of going beyond data sharing. Initiatives like Radiant MLHub are a great way to democratize access to remote sensing to help those communities most affected by issues such as climate change circumvent hardware and technical limitations and build solutions that serve their local needs. Moreover, through systems like the MLHub, training data such as ours can be easily integrated into new machine learning applications, giving life to our efforts in a data-scarce region.

--

--

Radiant Earth
Radiant Earth Insights

Increasing shared understanding of our world by expanding access to geospatial data and machine learning models.