The carrots and sticks of ethical NLP

Tyler Schnoebelen
5 min readApr 12, 2017

--

Thoughts after the Ethics in NLP workshop at EACL2017.

Professions run into ethical problems all the time. Consider engineering: the US sold $9.9b worth of arms in 2016 ($3.9b in missiles). The most optimistic reading is that instruments of death prevent death. Consider medicine: Medical research is dominated by concerns of market size and patentability, leaving basic questions like “is this fever from bacteria or virus” unanswered for people treating illnesses in low-income countries. Consider law: Lawyers upholding the law can break any normal definition of justice. Even in philosophy, ethicists are not known to be more moral than anyone else.

These ethical problems teach different lessons. Dilemmas in engineering ethics illustrate the importance of the doctor’s “first, do no harm”. Messes in medical morality suggest the importance of going beyond monetary incentives. Legal problems make it clear that systems must be addressed since they often do not do what they purport to do. And everyone, including professional ethicists, demonstrate the difficulty of enlarging our concern beyond the narrowness of ourselves and our in-groups.

Carrots and sticks

At the Ethics in NLP workshop last week, discussions turned to carrots and sticks. On the one hand, we want to work to draw people (and ourselves) towards ethically sound practices and projects. But on the other hand, it seems necessary to censure, punish, and prevent wrong-doing.

As we consider which sort of tasty vegetables and painful lashes might be offered or administered, it’s worth reflecting on where interventions could plausibly be made. A barebones sketch of systems of incentives looks something like this the following yellow and green systems graph.

Fig 1: A simplistic loop: where do ethics fit in? (Click here to create your own loops)

(The real point of this graph is to get you to help make it more sophisticated, but I hope it will serve for the time being.)

To come up with plausible interventions, means considering the different parts. For example:

  • Research: How do researchers treat humans involved in the production of the research (e.g., crowdsourced workers and social media folks)?
  • Results: How do researchers connect their work to outcomes?
  • Prestige: What do advisors, departments, and conferences reward? In industry, what goes into a great performance review and promotions?
  • Money: What are the conditions for funding/bonuses?

I tend towards carrots: exciting people for all the possibilities of doing good in the world as well as demonstrating the complexities of real-world ethical dilemmas to make them more curious and sensitive about what they’re doing and why. In essence: to engage them in the world and get them help the world engage with them.

The question flashed to the crowd at the start of Graeme Hirst’s talk was about motivating people to do something other than turning people into dinosaurs. (Apt in so many ways for AI researchers.)

One of my concerns at the ethics workshop was the idea of an ethical review board undermining progress in ethical enthusiasm. In the hands of bureaucracies, such boards offload researcher responsibility, ensnare them in hassles, and teach them ethical evasion. Quirine Eijkman makes the important point that when she talks to police/intelligence/security officials, they only want internal oversight. As she suggests, this has limited effectiveness in promoting ethical conduct.

For me, the point is not a faceless “expert” oversight board but one that comes from building bridges to communities most likely to be affected. At its best, this connects researchers to people outside their social networks to hear and think about problems on the ground. And it gives communities who are not typically part of technology policies a voice, making oversight partly ethnographic and partly a peremptory Truth and Reconciliation. But the details of this are far from fleshed out.

Ethical to be unethical?

Another theme that emerges is whether, in the tradition of security/privacy work, it can be ethical to build “bad” systems to demonstrate that they are bad. There is a difference between human subject research and building technology. People with technical chops probably have a responsibility to help prevent/block/cure bad technologies. That would suggest that you don’t just want to build a system that does something bad to show it, but that you want to actively develop countermeasures.

The L in NLP is Language, language means people

Technologists typically look to technology for salvation and they often do come up with interesting solutions. However, organizations like the Gates Foundation are rightly criticized for ignoring what experts-on-the-ground know — clinicians in West Africa, teachers in underfunded American schools, activists in social justice. Instead of discarding or replacing these people, we could look at NLP and other AI technologies as a way of giving human beings time to be more creative, strategic, and effective. Once again that means listening to people.

There’s a similar problem in gender research in much of NLP: folks who go about working in a complicated field like gender often don’t engage with all the literature outside of computer science showing that social categories (= all categories) aren’t easy, given things. (In the proceedings, check out papers by Corina Koolen and Andreas van Cranenburgh and Brian Larson.) Social theory is much more interesting than binary essentialism anyway and should inspire all sorts of new computational methods to deal with the complicated interaction of people, identities, relationships, and contexts.

It is possible to choose projects in NLP that help individuals and communities AND make important contributions to the field in other ways: to deal with complications of the social world can push algorithms and tools, too. A central question, to return to the circles diagram above, is how we configure education, reward, and links from research projects to meaningful interactions with human beings.

Further reading

My own preference is to increase the number of workshops and talks centered around difficult real-world use cases. That is, I believe in theory but it’s in cases like “Can you diagnose Trump for Alzheimer's through his public speech?” and “We want to target certain kinds of people for our special loans” that you get to see how you’d put ethics into practice.

But on the reading front, some good syllabuses include Emily Bender’s Ethics in NLP class and Michael Strube’s Dark Side of NLP (don’t be scared of the German! the references are almost all in English and start on page 9). As far as I can tell, there’s only one item that is on both syllabi:

Hovy, Dirk, & Spruit, Shannon L. (2016). The social impact of natural language processing. In Proceedings of the 54th annual meeting of the association for computational linguistics (volume 2: Short papers) (pp. 591–598). Berlin, Germany: Association for Computational Linguistics.

The two syllabi also have these authors, though they choose different works by them — I’ve tried to choose the more recent.

Brennan, Michael (2015). Can computers be racist? big data, inequality, and discrimination. (online; Ford Foundation)

Rao, Delip (n.d.). Fairness in machine learning. (slides)

Narayanan, Arvind, & Shmatikov, Vitaly (2010). Myths and fallacies of “personally identifiable information”. Communications of the ACM, 53 (6), 24–26.

--

--

Tyler Schnoebelen

Linguistics and language, data science and artificial intelligence, UX and design, travel, San Francisco. Want to talk about emoticons, emoji or AI?