The Practitioner’s Guide to Data Ethics is now live!

Published in

DataKindUK

6 min readJun 30, 2021

Screenshot of a Zoom meeting with roughly 12 adult smiling. Most have bright orange Zoom backgrounds that say We Love Data.

By Michelle Seng Ah Lee, AI Ethics Lead at Deloitte, PhD student on algorithmic fairness at the University of Cambridge, and DataKind UK Ethics Committee member

Today, we are launching the Practitioner’s Guide to Data Ethics, prepared by DataKind UK volunteers at a special Ethics DataDive event. Find it here.

Why did we run an Ethics DataDive?

Last July, DataKind UK’s Ethics Committee organised an Ethics DataDive, bringing together 18 participants for a virtual day of tool comparison. We knew there were many tools available for data scientists and developers to use to embed ethics into their models, but we wanted a resource summarising what is out there. Our aim was to complete an intensive, expert-driven, exploratory analysis of the current open-source tool landscape in algorithmic ethics in just one day (learn more about how we did this at the end of this blog)!

The overall objective was to explore the existing open-source tools that seek to assist data science practitioners with ethical challenges. Prior to the event, we divided volunteers into groups of 3–5. They would focus on one of five topics in algorithmic ethics that were selected by the DataKind UK Ethics Committee:

Fairness
Explanation
Natural Language Processing
Checklists
Communication Strategies

As the Dive began, we asked the participants to find toolkits in their topic area and score them on a number of metrics of functionality and user-friendliness, and write down their assessments.

Each group presented their findings and solicited questions and comments from the wider group in planned catch-ups during the day. The DataDive culminated in a final presentation from each group on the key takeaways and insights.

What were the key findings?

In each section of the report, you can find breakdowns about the tools including which were top-rated for functionality, user-friendliness, technical experts, non-technical beginners, and beginner data scientists. There is also a scorecard rating each tool out of five for attributes such as being user-friendly, scalable, and easy to integrate with other systems. And there are some overall pros and cons to help you compare each tool. The participants also discussed trends and gaps that they spotted in each topic area.

Here are a few highlights:

Fairness

Gap between real-life considerations and the academic vacuum use cases
Gap between lack of education among practitioners on what is essential in fairness evaluation vs. assumed expertise by tools
Lack of consistency in methodology — wildly different tools, approaches, techniques
Lack of tools tackling the end-to-end fairness
Lack of regression implementation vs. academic theory

Of the six tools looked at within the Fairness group (the highest number of tools in a group), common trends were that there is a gap between real-life considerations and the use cases that academics presented for them. This was also reflected in a lack of regression implementation compared to academic theory about the use of each tool. There is also a gap between what practitioners are educated about as essential in fairness evaluation, compared to the level of expertise the tools assume people have. There was a lack of consistency between tools, with a huge variance in their methodology, approaches, and techniques. They also lacked ways to tackle end-to-end fairness, generally being designed for specific points in a data pipeline.

Ethics Committee volunteer Michelle Seng Ah Lee, who led this group, took the lessons from the DataDive to publish a paper on the “Landscape and Gaps in Fairness Toolkits.” This paper is available here.

Explanation

Tools are often broken and unmaintained
Explainability needs to be built-in and become core to existing libraries with less separation, as it is currently a stand-alone function
Usability level varies between open source and commercialised products

When assessing the five Explanation tools, the group found they were often broken and unmaintained. They were also treated as stand-alone functions, rather than being built-in as core parts of existing libraries. The usability varied a lot between open source and commercialised products, with commercial organisations able to put resources behind an accessible user interface compared.

Natural Language Processing (NLP)

The natural language processing toolkits were concluded as too nascent and lacking in robustness to be able to be implemented into a developer’s workflow without major modifications
There is a lack of NLP-focused package or library with decent documentation around detecting and removing bias that is model agnostic
Open source tools not robust or standardised or well-documented

There were three NLP tools, and the overall conclusion was that they were too nascent and lacked the robustness needed to be included in a developer’s workflow without major modifications. There is also a lack of NLP-focused packages or libraries with decent documentation around detecting and removing bias that is model agnostic. Finally, the open source tools were not standardised or well-documented.

Checklists

Ethics needs to be an iterative process, and a lot of them are one-off
Limited calls to action / clarity

The group assessing Ethics guidelines and checklists had five to choose from. However, ethics needs to be iterative, and a lot of them were one-off workflows. They also had quite limited calls to action, and didn’t provide a lot of clarity for how to implement next steps.

Communicating ethics

Existing tools are focused on the US
The tools we looked at required some familiarity with machine learning models and might be intimidating for complete newcomers
The tools we examined were useful, but would need to be complemented with other materials in order to persuade audiences that ethics is a necessary part of development and needs to be embedded throughout a product life cycle

There were only two tools in the Communicating ethics group, and they were quite US-centric. They required familiarity with machine learning models, making them intimidating for newcomers. They would also need to be complemented with other materials to really persuade audiences that ethics is a necessary part of development that needs to be embedded throughout a product’s lifecycle.

See the full guide here!

How we curated the Dive

The participants were recruited through the DataKind UK mailing list, with several targeted invitations to individuals actively engaged in algorithmic ethics. While initially planned as an in-person event, physical meeting was not possible due to COVID-19 restrictions. Given the participants were already familiar with the relevant literature and debates, curating this group — rather than randomly sampling — allowed for more rapid and in-depth assessment of the toolkits without the need for the initial preparation or training on relevant material. Once recruited, participants were split into the sub-groups, assigned by prioritising their stated preference collected in their registration form while maintaining a fairly even split in numbers among the groups.

The Ethics Committee members who organized the Dive event are Michelle Seng Ah Lee, Stef Garasto, Laura Carter, Ruby Childs, Nick Sorros, and Frankie Garcia.

We’d also like to say a huge thank you to all of the participants who attended on the day: Paolo Zoccante, Animesh Chaturvedi, Diego Arenas, Jat Singh, Jennifer Stirrup, Jo Watts, and Adam Hill. Some participants did not give permission for their names to be used.

How can I contribute?

The conclusion that can be drawn from this project is that there is plenty of work to do towards making ethics something that can be easily embedded into data science! We hope this can be a living document that can be updated as the open-source ethics toolkit landscape matures over time. You can access the Github page here. Please feel free to add change requests, and we will review them.