Sitemap

Machine Learning for Earth Observation — What about Ethics?

11 min readAug 26, 2022

Raphael Leuner and Daniel Glatter for Open for Good

Imagine an algorithm that analyzes the tracks and movements of refugees in the most remote regions of Afghanistan. An unprecedented accuracy of predictions could allow governments and NGOs to provide aid and support exactly where it is needed and potentially save lives. But what happens if this information falls into the wrong hands?

Press enter or click to view image in full size
Photo by jean wimmerlin on Unsplash

Why care about ethical ML4EO?

Machine Learning for Earth Observation (ML4EO) is an exciting new technology that promises a range of benefits for EO studies: First, geospatial surveys become easier and faster because a) it is no longer necessary to physically visit locations for data collection and b) data analysis is automated. Second, results get more accurate because algorithms trained on large-scale earth observation imagery can detect patterns that humans cannot. And third, ML4EO enables entirely new use cases, ranging from observing and predicting crop yields, to monitoring potential impacts of climate change and natural disasters, and even documenting human rights abuses such as forced displacement. Due to its significant benefits and wide applicability, ML4EO is increasingly being used in both the public and private sector to provide guidance for policymaking and development.

Due to this rapid adoption of ML4EO, ethical concerns may fall to the wayside. If developers and deployers do not consider ethics, ML4EO systems may — intentionally or unintentionally — harm groups and individuals, exacerbate inequality, and disregard human rights and freedoms.

In comparison to the discussion on AI ethics, the discussion on ethical issues in ML4EO is still in its infancy. In this article, we want to shed light on this new field and raise awareness. We will first provide an overview of potential ethical issues in ML4EO, focusing especially on gender-specific aspects, and then present ideas on how ML4EO applications can be developed more responsibly, maximizing the intended impact and minimizing unwanted effects on communities.

Where (ML4)EO goes wrong

The principles governing AI ethics are often abstract and complex. Their relevance becomes clear once we look at exemplary use cases and their potential risks. This, however, also implies that the examples presented here are by no means a conclusive collection of potential risks associated with the application of machine learning (ML) on earth observation (EO) data.

A visualization of the ML4EO Pipeline with red strikes indicating potential risks for responsible ML4EO applications
Potential ethical implications when developing and applying ML4EO systems

Ethical dilemmas are not new to the field of EO data. Even before the advent of AI and the previously unimaginable use-cases created by it, the public availability of large sets of EO data has created potential for abuse. Google Earth, created in the early 2000s to allow users to view satellite imagery from everywhere on Earth, caused a first realization of these potentials. The democratization of data access, previously available only to researchers or government officials, created huge opportunities for social and economic advancements, but also new security risks. Google Earth imagery has been used in terror attacks against oil facilities in Yemen and attacks on British and US troops in Iraq. In turn, governments agreed with Google to alter and partially blur many military sites, but also critical civilian infrastructure around the world.

However, governments do not necessarily only have public safety in mind when they ask for content to be removed from Google Earth. For instance, Bahrain temporarily banned Google Earth as a service after the service contributed to large scale protests in the country. Both the decision to publish EO images, as well as to restrict access, might have unintended consequences or unforeseen opportunities for abuse and the interests of governments and the public might not always be aligned.

Large sets of satellite data can also be annotated with further information from the ground to train AI models on them. AI models to analyze EO data can provide valuable insights on a massive scale, producing results that can directly benefit people on the ground. But the accessibility of annotated data and analysis models can also create the potential for additional unintended consequences, as well as for intentional abuse. Some researchers are taking this into account and deliberately restricting access to their data and models. In their paper, Raesetje Sefala, Timnit Gebru, Luzango Mfupe, Nyalleng Moorosi and Richard Klein have published an annotated dataset of Earth Observation data showing “spatial apartheid”, the segregation of cities in wealthy and non-wealthy communities in South Africa. They also created an AI model to classify satellite imagery of cities according to these categories and thus further increase the size of their dataset. While the authors mention many beneficial applications for this data, they also foresee the potential for usage that harms communities it is meant to benefit. Insurance companies could use it to set higher rates for members of marginalized communities; law enforcement could exploit such datasets for the surveillance of residents; banks could set higher loans for communities already struggling with access to financial opportunities. Therefore, the authors have decided to make this data public only upon a request containing a detailed description of intended usage and only for research purposes.

The fear of harming affected communities is not unfounded: A study by Maxwell Owusu et al. about the intended usage of geospatial data on “slums” in the greater Accra region in Ghana have shown an “application mismatch” between different stakeholders. NGOs and research organizations would use the data for the advancement of the people living there, while “the government institutions were more interested in using the information for eradicating slums or preventing their growths”. The approach of asking all stakeholders for their intention with a certain product as suggested by Philip Brey cannot avoid the publication of use cases that would harm intended beneficiaries, but it can make sure that developers have a clearer picture about risks associated with models and data.

Another exemplary emerging application for ML4EO is the tracking of refugees using, among other sources, satellite imagery. This technique was employed by the Norwegian Refugee Council to aid Afghan refugees after a devastating drought displaced more than 300,000 people in 2018. To find the location of refugee camps and estimate the number of refugees living in them, they turned to machine learning algorithms, which were applied to satellite imagery. Considering that migrants and refugees are some of the most vulnerable communities, making them easier to track might also make them easier to target. For instance, the same technology is also used by governments and border guards, whose primary goal is usually the prevention of unauthorized migration, which means ML4EO may be used to undermine the ability to claim asylum, which is a recognized human right. The technology may even be leveraged by criminal organizations such as human traffickers.

When building ML4EO applications, developers must also carefully consider the generalizability of a model. It can be tempting to use a model developed in a particular region of the world and apply it to another region. Scaling is one of the great benefits of AI systems, and easily accessible satellite imagery makes this very easy for EO data. However, there are strong limitations for scaling ML4EO models from one place of the world to another. Natural textures, agricultural patterns, and urban and rural infrastructure are massively varied in different regions around the world. An AI model developed to identify housing footprints with training data from North American suburbs will not perform well when it is used to do the same task in Bogotá. This example might seem obvious, but there are many more subtle cases where careful consideration is required. The decision on whether a model is generalizable to other regions should always be made with local expertise and its performance closely monitored with local validation data.

Over the past years, launch costs for new satellites have been dropping rapidly and more and more companies are launching their own commercial EO satellites into orbit. As a result, the resolution of available satellite imagery is quickly increasing. On the one hand, this drastically improves the usefulness of ML4EO applications and makes many current beneficial use-cases possible. On the other hand, it also leads to new privacy concerns: AI models and data annotations in combination with high resolution, centimeter-scale EO data can provide highly sensitive information and may allow for personalized targeting. This raises the question of whether and how informed consent, one of the principles of ethical data processing, should be implemented for EO data.

Let’s talk about gender!

A common concern with AI applications is the potential transfer of societal biases into data and models. A UNICEF report on ethical considerations for the usage of geospatial technologies describes this as “Geography and the relationship between location, poverty, gender and race may result in trends and predictive models that discriminate against certain persons and populations in particular locations”. The use-cases presented above have already highlighted some examples of how AI systems can disproportionally affect vulnerable communities. Here, we want to put a special focus on some specific risks associated with gender biases.

Some might argue that EO data gathered by satellites would be gender-neutral and therefore no special attention is required to address gender biases. However, gender inequalities in the wider society exacerbate adverse effects that ML4EO models can have on women. All use-cases have individual risk that need to be examined — an agricultural use case has different risks from a gender-perspective than a use case examining violence or sex trafficking faced by female refugees. But there are some overarching considerations that should be made for every EO-based application.

Many ML4EO use-cases rely on volunteers to label images or map the area around them. This crowd-sourced ground truth data is often the basis for the creation of datasets with which AI models are trained. In many areas of the world there is a digital divide between women and men regarding access to digital technology. This means that women are not just less likely to benefit from AI solutions, even if they are publicly available, but are also less likely to contribute to crowd-sourced data collection, as this requires access to digital technology in the first place.
OpenStreetMap for example is one of the biggest crowd-sourced data collection efforts for geographic ground truth data. While the exact gender ratio of contributors is not known, there are polls that suggest that as little as 3–4% of the contributors of OpenStreetMap are women. Studies have shown that women and men map their surroundings differently when using the platform and focus on different items. Thus, the huge gender disparity is creating biased datasets, for example when it comes to the mapping of locations for daily life. Places relevant for roles that are traditionally performed by women, such as childcare centers or supermarkets, but also vitally relevant infrastructure such as women’s health facilities, might be less likely to be covered on the map.

The effects of gender biases could be made even worse if no mitigation measures are put in place, as datasets are used for the development of Machine Learning models. The most promising mitigation measure is to involve people with a diverse set of perspectives and backgrounds into the development process. However, only 22% of the global AI workforce identify as female, while the number in the EO space is even lower at just 19%. While this does not mean that gender-based issues are always ignored during development processes, it makes special efforts to mitigate the effects of biased datasets less likely. To develop meaningful and inclusive applications that adhere to ethical standards, there are two things that can be done: stakeholders should strive for additional awareness around potential gender-related biases in the short term, while also seeking to create more diverse development teams to offset the described imbalances in the long term.

What can we do to make ML4EO (more) ethical?

Mitigation encompasses several points. Fundamentally, it means to prevent (or at least reduce) harmful impacts of ML4EO. Indeed, there should be efforts to ensure that the access to data and resources, as well as to the benefits of ML4EO, are shared equitably.

A crucial step towards a more ethical ML4EO is to address the current lack of awareness and resources dedicated to ML4EO Ethics– consideration of ethical issues in the development of ML4EO projects, starting with the data collection, must become standard practice.

An ethical approach to ML4EO projects includes several tangible components:

  1. Practitioners need to involve affected populations from the outset of ML4EO projects, starting with the project planning and design. There is a strong need for gathering information about the context, i.e. facts or “ground truths” that cannot be captured from a satellite image.
  2. To involve affected communities as fully as possible, the project team itself should also be interdisciplinary, drawing on knowledge from technical and non-technical fields.
  3. Practitioners should explicitly model potential issues from the inception stage and continue to monitor and revisit them during implementation. The key question to ask here is: “What could happen if this information fell into the wrong hands?”
  4. Practitioners should strive for transparency around data collection, data usage, and analysis techniques. An example would be to create scorecards to describe the accuracy of ML4EO analyses, e. g. drawing on Google’s Model Cards or to describe used data with the Datasheets for Datasets approach. Being transparent also includes striving for explainability (being able to understand the system’s conclusions) as well as ensuring clear accountability for ML4EO systems.

To enable such approaches grounded in ethical considerations to ML4EO projects, the global ML4EO community must ensure the right framework conditions are given. It is thus necessary to develop training materials on ML4EO Ethics and ideally embed them already into university curricula. In addition, there must be ethical guidelines or frameworks to guide practitioners in their daily work. Furthermore, we should investigate policy gaps where regulators are needed to enforce fundamental requirements, for example with relation to privacy and gender considerations.

Finally, we should work towards making the global ML4EO community more equitable and diverse by working to reduce the gender gap, increase diversity, and ensure the participation of the Global South.

Our Key Takeaways

● ML4EO is rapidly being adopted, but ethical concerns may fall to the wayside, especially since practitioners are not yet sufficiently aware of potential issues.

● New data and AI models will have unintended consequences, both positive and negative. Developers should make more efforts to anticipate them as much as possible and document them transparently.

● Despite the global nature of earth observation data, a local perspective is very important. Only when experience from the ground and affected populations are included can ML4EO applications develop their full potential and abuse be prevented.

● Due to access and development inequalities, a use-case is never gender-neutral. This comes in addition to gender-related issues specific to individual projects.

● In general, we need much more work towards developing guidelines for practitioners, closing policy gaps, and building more equitable and diverse communities.

This article is based on a report by DataPop Alliance which was created for the GIZ’s FAIR Forward programme and the Open for Good Alliance. You can reach the DataPop Alliance’s Zinnya del Villar at zdelvillar@datapopalliance.org and Matteo Rojas at mrojas@datapopalliance.org.

Further readings

Ethical Considerations When Using Geospatial Technologies for Evidence Generation. Gabrielle Berman, Sara de la Rosa, Tanya Accone. (2018).
https://www.unicef-irc.org/publications/972-ethical-considerations-when-using-geospatial-technologies-for-evidence-generation.html

Anticipatory Ethics for Emerging Technologies. Philip A. E. Brey (2012). https://link.springer.com/article/10.1007/s11569-012-0141-7.

Beyond the promise: implementing ethical AI. Ray Eitel-Porter (2021). https://link.springer.com/article/10.1007/s43681-020-00011-6.

The Gendered Geography of Contributions to OpenStreetMap: Complexities in Self-Focus Bias. Maitraye Das, Brent Hecht, and Darren Gergle (2019).
https://dl.acm.org/doi/10.1145/3290605.3300793

The Effects of AI on the Working Lives of Women. UNESCO (2022). https://unesdoc.unesco.org/ark:/48223/pf0000380861

Geo-Ethics in Slum Mapping. M Owusu et al. (2021).
https://ieeexplore.ieee.org/document/9553570

Constructing a Visual Dataset to Study the Effects of Spatial Apartheid in South Africa. Raesetje Sefala, Timnit Gebru, Luzango Mfupe, Nyalleng Moorosi, Richard Klein (2021).
https://openreview.net/forum?id=WV0waZz9dTF

The Google Controversy — Two Years Later. The Open Source Center (2006).
https://irp.fas.org/dni/osc/google.pdf

--

--

Open for Good Alliance
Open for Good Alliance

Written by Open for Good Alliance

A global alliance to improve access and availability of localized AI training data in Africa, Asia and beyond

No responses yet