Ethics Sheets for AI Tasks

and an example sheet for automatic emotion recognition

Saif M. Mohammad
21 min readJul 5, 2021
A lighthouse shining a beam of light into a calm sea at dusk. Several stars are visible in the sky.
Photo by Wiedemann

Goals of this Article:

  • Make a case for documenting ethics considerations at the level of AI *Tasks* — The Case
  • Propose a new form of such an effort: Ethics Sheets for AI Tasks — The Proposal
  • Provide an example ethics sheet for Automatic Emotion Recognition and Sentiment Analysis — The Example

(Jump to for a discussion and frequently asked questions about ethics sheets for AI Tasks.)

This work was also presented at the UBC Language Science Seminar. A video of the talk is available.

Target audience: AI, ML, NLP researchers and developers

Abbreviations: Artificial Intelligence (AI), Machine learning (ML),
Natural Language Processing (NLP)

Papers:

Feedback: Since this is a new proposal, it can likely benefit from more ideas. Send me a note and I will be happy to incorporate feedback. Hopefully, this article will stimulate further discussion and better versions.

Contact: Dr. Saif M. Mohammad
Email: saif.mohammad@nrc-cnrc.gc.ca, Twitter: @saifmmohammad

The Case

Good design helps everyone. (See this article, for example, on how designing for accessibility helps everyone.) As AI, ML, and NLP systems become more ubiquitous, their broad societal impacts are receiving more scrutiny than ever before. However, several high-profile instances have highlighted how technology is often at odds with the very people it is meant to help, and how it will often lead to more adverse outcomes for those that are already marginalized. This raises some uncomfortable questions for us AI researchers and developers:

What role do we play in this?
What are the hidden assumptions in our research?
What are the unsaid implications of our choices?
Are we perpetuating and amplifying inequities or are we striking at the barriers to opportunity?

The answers are often complex and multifaceted. While many AI systems have clear benefits…

We have seen the real-world AI systems gone wrong

Text in large font saying “Machine Bias”. Below is is the text: “There’s software used across the country to predict future criminals. And it’s biased against blacks. by Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner, ProPublica May 23, 2016”
ProPublica artcle on bias in recidivism prediction systems.

We have seen heavy criticisms of published research that often feeds into real-world systems

  • criticisms of physiognomy, racism, bias, discrimination, perpetuating stereotypes, causing harm, ignoring indigenous world views, and more

Arcas, Mitchell, and Todorov (2017):

Ongweso Jr (2020):

There have also been legitimate criticisms of the thoughtlessness in machine learning (e.g., is automating this task, this way, really going to help people?) and a seemingly callous disregard for the variability and complexity of human behavior (Fletcher-Watson et al. 2018, McQuillan 2018, Birhane 2021).

How are we Addressing Ethical Concerns in AI/ML/NLP Research?

  • For individual datasets, it is recommended to create datasheets, data statements: they list key details of the datasets such as composition and intended uses; meant to encourage appropriate use of the data.
  • For individual systems, it is recommended to create model cards:
    they list key details of the models such as performance in various contexts and intended use scenarios; meant to encourage appropriate use of the systems.
  • For individual papers, we have ethics/impact statements, ethics policies, ethics reviews:

Datasheets and model cards are pivotal inventions, that will serve our community well. However, they are not without limitations and the specificity of their scope places additional constraints:

  • Authors are in a position of conflict of interest; there are strong incentives to present the work in positive light (for paper acceptance, community buy-in, etc.)
  • There can be a tendency to produce boiler-plate text without a meaningful and critical engagement with the ethical issues.
  • A comprehensive engagement with the relevant ethical issues requires engagement at a level beyond individual papers and add-on documents for individual projects.
  • Lastly, ethics considerations apply at levels other than individual projects, for example, whole areas of work and at the level of AI tasks.

Occasionally we write critical position papers. Here are some examples looking at specific areas of research:

However, these documents are not standardized in any way. They tend to focus on certain most poignant ethical considerations, as opposed to capturing the wide array of relevant ethical considerations. They are often presented as position papers; and not as reference documents that are easy to explore and jump to the issues of interest.

Additionally,

Ethical Considerations also apply at the Level of AI Tasks

Just to be clear: I am defining an AI task to simply mean some task we may want to automate using AI techniques. An AI system is a particular AI model built to do the task. Individual systems have their own unique sets of ethical considerations (depending on the choices that were made in terms of how to create the system). However, some ethical considerations apply not at the level of individual systems, but at the level of the task. For example, consider the task of detecting personality traits from one’s history of utterances: even before we get to the level of individual systems, we might want to think about questions such as:

  • What are the societal implications of automating personality trait detection?
  • How can such a system be used/misused?
  • What are the privacy implications of such a task?
  • Is there enough credible scientific basis for personality trait identification that we should attempt to do this?
  • Which theory of personality traits should such automation rely on? What are the implications of that choice? and so on.

Currently, AI conferences and journals do not have a dedicated place where
one can discuss such questions that apply to the tasks being automated.

In addition to ethical considerations that apply directly to the task, we know that there are ethical considerations latent in the choices we make in dataset creation, model development, and evaluation. Poor choices have manifested in controversies for a number of AI tasks. So:

If one wants to do work on an AI Task, it will be useful to have a go-to point for the ethical considerations relevant to that task!

“Wait, wait, wait!”, you say, “Maybe it would be good to have such a thing for Face Recognition, given its many public controversies, but do we really need one for all AI tasks? Are there ethical considerations associated with every AI task?”

Lets take a look at some tasks…

How about automatic emotion/sentiment recognition from text?

How about automatic emotion recognition from faces?
(the unholy combination of face recognition and emotion recognition?)

How about Machine Translation — the long-time flag-bearer of NLP?

How about Image Generation?

Text generation?

Personality trait identification?

Information Extraction / Question Answering?

Coreference resolution?

Numerous other such examples have surfaced in just the past few years for a variety of AI task

Spelling correction?

I do not know, but I would be remiss to say there are no ethical concerns there. The point is:

Different AI tasks may be more or less prone to controversy, but all AI tasks impact people in some way, and thus have ethical considerations. Sometimes even small and seemingly innocuous choices can have far-reaching implications. Sometimes a thoughtful consideration can help make a small, but notable difference, to improve someone’s life.

Once we read the relevant literature and develop some AI systems, it is not hard to begin to identify some of the ethical considerations for various NLP, ML, and AI tasks; but that takes time. Meanwhile, we have tens of thousands of new researchers joining our ranks. Even those of us that have been here a while can benefit from some careful compilation of ethical considerations.

The Proposal

Create Ethics Sheets for AI Tasks

If one wants to do work on an AI Task, then right at the beginning it is useful to have:

a carefully compiled document that substantively engages with the ethical issues relevant to that task; going beyond individual systems and datasets, drawing on knowledge from a body of relevant past work and from the participation of various stakeholders.

Similarly, if one conceptualizes a new AI Task, then right at the beginning, it will be useful to develop such a source of information.

I will refer to these documents as Ethics Sheets for AI Tasks. Simply put:

an ethics sheet for an AI Task is a semi-standardized document that aggregates and organizes a wide variety of ethical considerations for that task.

It:

  • Fleshes out assumptions hidden in how the task is commonly framed, and in the choices often made regarding the data, method, and evaluation.
  • Presents ethical considerations unique or especially relevant to the task.
  • Presents how common ethical considerations manifest in the task.
  • Presents relevant dimensions and choice points; along with their tradeoffs for various stakeholders.
  • Lists common harm mitigation strategies.
  • Communicates societal implications of AI systems to researchers, developers, and the broader public in an accessible way with minimal jargon.

Ethics sheets may sometimes suggest that certain applications in certain contexts are a good or bad idea, but largely they are meant to discuss what are the various considerations to be taken into account when deciding how to build or use a particular system, whether to build or use a particular system, what is more appropriate for a given context, etc. The sheet should flesh out various such considerations that apply at the task level. It should also flesh out ethical consideration of common theories, methodologies, resources, and practices used in building AI systems for the task. A good ethics sheet should make us question some of the assumptions that often go unsaid.

One key motivation for developing ethics sheets is to encourage more thoughtfulness:

  • Why should we automate this task?
  • What is the degree to which human behavior relevant to this task is inherently ambiguous and unpredictable?
  • What are the theoretical foundations at the heart of this task?
  • What are the social and cultural forces at play that motivate choices in task design, data, methodology, and evaluation?
    Science is not immune to these forces (there is no 'view from nowhere').
  • How is the automation of the task going to impact various groups of people?
  • How can the automated systems be abused?
  • Is this technology helping everyone or only those with power and advantage? etc.

Thinking about these questions is important if we want to break away from the current paradigm of building things that are divisive (that work well for some and poorly for others) and instead move to building systems that treat human diversity and variability as a feature (not a bug), systems that truly dismantle barriers to opportunity, and bring diverse groups of people together. Thus, questions such as those shown above can be useful in determining what is included in ethics sheets.

Target audience:

The target audience for an ethics sheet includes the various stakeholders of the AI Task. The stakeholders may or may not have the time and background to understand the technical intricacies of an AI task. However, they build on, use, and make laws about what we create. Further, people are impacted by AI systems. They should be able to understand its decisions that impact them, understand its broad patterns of behaviour, contest the predictions, and find recourse.

Ethics sheets can help to that end.

It is our responsibility to describe our creations in accessible terms, so they make informed decisions.

Thus the target audience of an Ethics Sheet includes:

  • Researchers
  • Engineers
  • Data science professionals and enthusiasts
  • Educators (especially those who teach AI, ethics, or societal implications of technology)
  • Media professionals
  • Policy makers
  • Politicians
  • People whose data is used to create AI systems
  • People on whose data AI systems are applied
  • Society at large

Owing to differences in backgrounds and needs, it is better to create versions of the Ethics Sheet tailored to stakeholders, for example:

  • one sheet for society at large (without jargon and with a focus on how system behaviour can impact them and how they can contribute/push-back);
  • one sheet for researchers, developers, and the motivated non-technical reader (with perhaps a greater emphasis on system building choices and their implications).

No One Sheet to Rule them All

A single ethics sheet does not speak for the whole community

There is no one person or institution that can claim to be the authority or provide the authoritative ethics sheets for the task. Ethics sheets can be created through large community efforts (through workshops or carefully maintained wikis) and smaller individual and group efforts. Just as a survey article, no single ethics sheet speaks for the whole community. Efforts led by small teams may miss important perspectives. However, community efforts face several logistical and management challenges. They also have the tendency to only include agreed upon non-controversial ideas that do not threaten existing power structures. While each of these approaches to implement ethics sheets has their pros and cons, a multiplicity of ethics sheets is likely most promising.

Multiple ethics sheets can be created (by different teams and approaches) for the same/overlapping tasks to reflect multiple perspectives, viewpoints, and what is considered important to different groups of people at different times.

We should be wary of the world where we have single authoritative ethics sheets per task and no dissenting voices. More on implementation details in the FAQ section.

Working on Ethics Considerations is a Perpetual Task

  • The set of ethical considerations for a task is not a static list; it needs to be continuously or periodically revisited and updated.
  • Can be developed iteratively and organically through input from multiple individuals and teams of researchers, practitioners, and scholarly organizations such as workshops and conferences.
  • The ethics sheet is not a silver bullet, but rather just another tool in our armament for responsible research.
  • The goal here is to raise awareness of the ethical considerations so that we think of new and better approaches for responsible research. The goal here is not to provide a list of easy solutions that “solve ethics”.

Components of an Ethics Sheet

Below are some sections that I think are central. However, every task is different, and may warrant additional sections.

Preface: Present why and how the sheet came to be written. The process followed. Who worked on it along with their professional or lived experience relevant to the subject matter. Challenges faced in writing the sheet. Changes made, if a revision of an earlier sheet. Version number, date published, and contact information.

Introduce, Define, Set Scope: Introduce the task and some common manifestations of the task. Define relevant terminology. Set the scope of the ethics sheet (e.g., maybe you are creating a sheet for speech input, but not textual input).

Motivations and Benefits: Provide a high-level overview of the common benefits and motivations of the task.

Ethical Considerations: This is the star of the show. Aggregate and organize the ethical considerations associated with the AI task. Present the trade-offs associated with choices. Present harm mitigation strategies. Cite relevant literature. Organization of ethical considerations should be based on the primary target audience. For example, ethics sheets primarily for researchers and developers may benefit from sub-sections on: Task Design, Data, Method, and Evaluation. Task design may benefit from sub-sections for theoretical foundations and ‘why automate this task?’. Evaluation will benefit from sub-sections that go beyond quantitative metrics.

Other: Include anything that helps with the goals of the Ethics Sheet.

Benefits of Ethics Sheets for AI Tasks

Ethics sheets for AI Tasks address a number of concerns raised earlier in this article. Specifically, their main benefits can be summarized as shown below:

  1. Encourages more thoughtfulness regarding why to automate, how to automate, and how to judge success in AI research and development.
  2. Fleshes out assumptions hidden in how the task is commonly framed, and in the choices often made regarding data, method, and evaluation.
  3. Presents the trade-offs of relevant choices so that stakeholders can make informed decisions appropriate for their context. Ethical considerations often involve a cost–benefit analysis; where we draw the lines may differ depending on our cultural and societal norms.
  4. Identifies points of agreement and disagreement. Includes multiple points of view.
  5. Moves us towards consensus and community standards.
  6. Helps us better navigate research and implementation choices.
  7. Helps in developing better datasheets and model cards.
  8. Has citations and pointers; acts as a jumping off point for further reading.
  9. Helps stakeholders challenge assumptions made by researchers and developers.
  10. Helps all stakeholders (including researchers, developers, and society) develop harm mitigation strategies.
  11. Standardized sections and a familiar look and feel make it easy for the compilation and communication of ethical considerations.
  12. Can play a vital role in engaging the various stakeholders of an AI task with each other.
  13. Multiple ethics sheets can be created for the same task to reflect multiple perspectives, viewpoints, and what is considered important to different groups of people at different times.
  14. Acts as a great introductory document for an AI Task (complements survey articles and task-description papers for shared tasks).

The Example

Go here for an example ethics sheet for Automatic Emotion Recognition and Sentiment Analysis:

Note that many of the ethical considerations listed in the above sheet apply broadly to natural language tasks in general. Thus, it can serve as a useful template to build ethics sheets for other tasks.

Discussion and FAQ

The idea of ethics sheets raises several important questions that are worthy of discussion. I discuss some of them below.

Q1. Should we create ethics sheets for a handful of AI Tasks (more prone to being misused, say) or do we need ethics sheets for all AI tasks?

A. To me, the answer is clear. We need to write ethics sheets for every task.
This follows from the idea that we need to think about ethics considerations pro-actively and not as a reaction to harms that we observe after system deployment. Different AI tasks may be more or less prone to controversy, but all AI tasks impact people in some way, and thus have ethical considerations. Sometimes even small and seemingly innocuous choices can have far-reaching implications. Sometimes a thoughtful consideration can help make a small, but notable difference, to improve someone’s life.

Ethics sheets for AI Tasks can provide the means for us as a collective to provide, in writing, what we think are the ethical considerations and the societal implications of AI Tasks. For some tasks, this document can be short and straightforward indicating minimum risk; and that document and the process that led to it are still useful. We do not know if there is minimum risk without some amount of investigation. Also, having a written document allows the stakeholders to challenge our assumptions and conclusions. We cannot predict everything and anticipate every harm. We should not let that stop us from creating a working document that will be useful to others. Ethics sheets will always be incomplete and require revisions. Periodically revising the document builds on our knowledge.

Q2. Who should create Ethics Sheets for AI Tasks?

A. There are two things going on here:

  1. Who should take a *lead* in developing ethics sheets (who should take on more of the burden)?
  2. Whose voices should be included when developing ethics sheets?

For 1, anyone or any group can take the lead. Researchers who are working on the task (or are proposing a new task) are well-positioned to do the ethics sheet as they are familiar with the intricacies of the task and likely thinking about the ethical implications already. However, experienced researchers may have more blind spots. New researchers, especially those from Social Science, Psychology, Linguistics, etc. can bring vital new insights. For 2, voices of all stakeholders should be included (especially of those impacted by the technology).

Ethics sheets can be developed iteratively through input from multiple individuals and teams of stakeholders. They can be developed through community efforts in workshops and conferences. One can also imagine a meta-sheet that summarizes or compiles information from multiple ethics sheets for a task. Not everyone needs to create an ethics sheet, but it is important to include voices from a diverse set of people (research backgrounds, locations of work, etc.) in an ethics sheet.

Q3. Should ethics sheets be built *only* through organized community efforts and by a joint consortium of all stakeholders? Should we only have authoritative ethics sheets and not a plethora of different ethics sheets for the same task?

A. IMHO, no and no. While building ethics sheets through organized community efforts is fantastic, we should not limit that to be the only avenue. There are several reasons for this:

  • Community efforts take tremendous resources, organization, and fortitude. They can benefit considerably from early and focused ethics sheets developed at a smaller scale. Community efforts also face significant challenges in terms of how to incorporate everyone’s opinions.
  • Community efforts have the tendency to only include agreed upon non-controversial ideas that do not threaten existing power structures.
  • In some ways, ethics sheets are akin to survey papers. Their scope is not individual pieces of work, but a body of literature. One can argue that survey articles should be community efforts or that they be created by all stakeholders. However, we also value the expertise of individual or small groups of researchers to create survey articles. We agree that it is their perspective and does not speak for the whole community. A similar affordance could be given to creators of ethics sheets.

So, IMHO, it is better to have a multitude of ethics sheets reflecting the diversity in viewpoints and the diversity of what is valued by different groups of people. We should be wary of the world where we have single authoritative ethics sheets per task and no dissenting voices. I would even encourage people to build their own personal ethics sheets (building on existing ethics sheets where available) even if they cannot extensively engage all stakeholders. After all, thinking about ethical considerations should be a natural part of one’s work.

Q4. This seems great! How can we further incentivize researchers to create Ethics Sheets? Could this be a publication? Should conferences have specific tracks for these?

A. Good ethics sheets are *useful* to researchers. So I expect they will be widely appreciated, especially by those new to an AI Task. They are also useful to those who create the sheet. I created an ethics sheet for emotion recognition because I do research on emotions and language, and I wanted to organize my thoughts around relevant ethics issues. I am very grateful for all that I have learned in the process and to all who contributed to this effort through discussions and feedback on earlier drafts.

Our conferences are starting to accept more papers that make contributions outside of computational research (even if much is still desired). So my hope is that good ethics sheets will be accepted at conferences and journals even without a special track. That said, clear signals from conferences and journals that such contributions are valued (perhaps by creating dedicated tracks) is important. Traditionally, work on identifying and discussing ethical considerations has often been under-valued compared to improving on accuracy metrics and computational methods for mitigating bias/issues. Therefore:

Just as many conferences now have a Resource and Evaluation track or a Survey Paper track, I propose we create dedicated conference and journal tracks (with appropriate reviewing forms) for identifying and discussing ethical considerations and societal impacts of AI. The papers in this track may also provide avenues for responsible AI research and system deployment using ideas from various other fields and participatory research. Notably though, this will be a home for non-computational ethics work. Ethics sheets can be one of many paper types submitted there.

If there is sufficient buy-in to the idea, we can also organize community efforts at conferences and workshops to develop more comprehensive and democratic ethics sheets that reflect community consensus and areas of disagreement.

Shared-task proposals can be encouraged to develop or point to relevant ethics sheets.

Also, it makes sense to cite ethics sheets for accepted norms in a field and for information on relevant ethical considerations. So creators of ethics sheets can get credit.

Q5. When should we be creating Ethics Sheets for AI Tasks? Normally, we learn about ethical issues because/after they have been deployed.

A. While we cannot foresee all consequences of our creations, it would be fair to say AI researchers have not done enough to anticipate the negative consequences of systems that we have created and deployed. Additionally, with great work over the last few years highlighting the ethical implications of AI systems, we are better placed to anticipate issues for the future. Therefore:

For existing tasks: we should create ethics sheets now; revisit them periodically and update them as necessary.

For new proposed tasks: the authors should create ethics sheets along with the paper introducing the task; as the task has more buy-in from the research community, others can also create ethics sheets for it; we revisit the sheets periodically and update them as necessary.

Q6. Does it matter what we define as a ‘task’? AI tasks can be defined at a high/general level (e.g., automatic emotion recognition) or fine/specific level (e.g., detecting sentiment in book reviews).

A. We can let community interest and expertise guide what task definitions are used (similar to topics of survey papers). There is no “objective” or “correct” ethics sheet or survey article. There is no one “correct” scope or task definition for ethics sheets. It is useful to have multiple ethics sheets for the same or overlapping tasks, just as it is useful to have multiple survey articles for overlapping areas of research.

Q7. Should the sheets depend on the kind of data or modality involved?

A. Yes, one can create focused ethics sheets as appropriate. In the example AER sheet, I specify in the “Scope and Modalities” section that the sheet focuses primarily on AER from language (text); however, many of the considerations apply to other modalities as well and the sheet also addresses ethical considerations that apply to AER in general (regardless of data/modality).

Q8. In terms of ethical considerations, should we think about research systems differently from deployed systems that directly impact people?

A. I think that is a fair point. Deployed systems have a much higher bar in terms of balancing many ethical considerations. It is common for research systems to focus on a smaller number of dimensions (say accuracy on certain test sets) ignoring certain other dimensions. However, research systems are often picked up by developers and deployed. So research systems should make their dimensions of focus clear to the reader/user. They should also discuss the suitability of deploying such a system, intended uses, and ethical issues that may arise if one deploys their system.

Q9. Why Should Academic Researchers Care about this?
Isn’t this the responsibility of those who deploy systems?

A. Academic research feeds into commercial research and development. We need to communicate the ethical considerations of what we create. Also, we are often not in positions of conflict of interest. We do not have to worry about losing our jobs for raising concerns.

Q10. Is there a time dimension for these ethics sheets? Will these sheets be valid for only a certain time period?

A. Yes, the sheets are only “valid” as long as people think they are useful. If the sheets no longer reflect the values we hold, or if things change, we need to create revisions and new ethics sheets. Ethics sheets will act as a record of what was considered important by different groups of people at different times.

Q11. Do you have any further pointers on reading for AI Ethics?

A. Here are some great books:

Book covers of: Algorithms of Oppression, Race After Technology, Weapons of Math Destruction, Automating Inequality, and Artificial Unintelligence

Acknowledgments

I am grateful to Annika Schoene, Isar Nejadgholi, Mohamed Abdalla, and Tara Small for steadfast encouragement on the initial idea of Ethics Sheets for AI Tasks. Many thanks for the thoughtful discussions and comments on earlier drafts of this work. Huge thank you to Mallory Feldman for her belief in the need and value of the ethics sheet for emotion recognition. Discussions with her on the psychology and complexity of emotions were invaluable in shaping the ethics sheet for automatic emotion recognition. Many thanks to Roman Klinger, Rada Mihalcea, Peter Turney, Svetlana Kiritchenko, Maria Liakata, Gerard de Melo, and Emily Mower Provost for discussions about AI ethics and ethical considerations for emotion recognition. Many thanks to Emily Bender, Esma Balkir, Patricia Thaine, Brendan O’Connor, Cyril Goutte, and Sowmya Vajjala for thoughtful comments on early drafts of the blog posts.

Paper

Ethics Sheets for AI Tasks. Saif M. Mohammad. arXiv preprint arXiv:2107.01183. July 2021.

Feedback

The author welcomes feedback and suggestions, including: disagreeing views, additional considerations to include, and any suggestions for improving this article. (Email met at saif.mohammad@nrc-cnrc.gc.ca).

Dr. Saif M. Mohammad
Senior Research Scientist, National Research Council Canada

Twitter: @saifmmohammad
Webpage: http://saifmohammad.com

--

--

Saif M. Mohammad

Saif is Senior Research Scientist at the National Research Council Canada. His interests are in NLP, especially emotions, creativity, and fairness in language.