Who is responsible for technology that goes wrong? We all are, so let’s do our part
When a company ships a product, whether they are a start-up, tech giant, or an open source non-profit, they think about use cases, user journeys, personas and the expected and planned product usage. In the product world, we don’t often think long or hard enough about how what we create may have a negative impact. It is not even an established mindset that companies should be responsible for the unintended consequences of their products (outside of what is already legally protected). Additionally, some people might think that being concerned about ethical risks early on in product development may stifle innovation.
However, here in Wellcome Data Labs we believe that those with the power should be at the forefront of making sure technology actually has a positive impact. Technology organisations cannot be held accountable for every unintended consequence, but there is plenty of opportunity to try anticipating and mitigating for it.
As explained in this blog post by Danil Mikhailov, Head of Wellcome Data Labs, we are experimenting with a new approach to ethical data science. The first product we are applying it to is our ‘The Policy Tool’. This tool, which uses machine learning, was originally developed to help internal teams gain insights into how our funded research is cited by organisations that have a strong impact on the policy sphere. We are now open sourcing this tool and creating a user interface for other organisations to use. The tool will initially have the following features:
- Users will be able to find in which policy documents their list of publications are being cited, as well as examining where other research is being cited.
- Users will be able to search through and analyse trends in the documents themselves, which can lead to a variety of insights on what these policy organisations are talking about.
How have we been thinking about ethical data science?
It can be challenging to implement ethical product development. We know it is not possible to anticipate all unintended consequences of a product, but we can make some rigorous guesses with the right minds, the right knowledge, in the right place. Awareness of the risks is a very good first step. This is why we decided to create an ethical review workshop in January with our product team and other stakeholders our ethical data science working group. The working group was formed last October and included members from outside the Data Labs team, acting as a further view of activities we are taking to consider ethics. The workshop attendees were therefore a wide mix of data engineers, data scientists/analysts, user experience leads, and strategists.
The task of the workshop was a simple one — thinking through what could go wrong. While thinking about unintended consequences is a discussion that can be had in any format, we decided in think about examples of use in four different aspects:
Use cases: The product is used in a way we intended.
Stress cases: The product is used in our intended way, but it has unintended consequences for users. A great explanation of stress cases is here.
Abuse cases: The product is deliberately used by someone in a way that we didn’t design it to be used.
Misuse cases: The product is unintentionally used by someone in a way we didn’t design it to be used.
Though not common, abuse and misuse cases are sometimes considered in product development. Stress cases have only made it into UX and product conversations in recent years, thanks to Sara Wachter-Boettcher and Eric Meyer in ‘Design for Real Life’ and her book, ‘Technically Wrong’. Stress cases, or ‘edge cases’ in engineering, are the situations in which the user understands the interface, is using the interface as expected, but the effect is not what was intended. Examples are these situations are (unfortunately) fairly common: in the tech and product world; such as the gym computer system that recognised a female members access card as male because she had ‘Dr’ in her title, and how facial recognition software that varies greatly in accuracy depending on gender and skin colour.
The structure of the workshop took inspiration from liberating structures following the ideas generation format of 1–2–4-all. The group was divided into half: one group took Use and Stress Cases, and the other group took Misuse and Abuse cases. Within this structure everyone is given time alone to come up with their own ideas. It is important to give enough for people to think outside of the box. By then sharing in groups of twos and fours it allows for some further refinement and idea generation as small groups. All ideas were shared as post-it notes in the plenary discussion at the end and placed on the use case quadrant as per below.
We collected 38 cases from this exercise, and we focused on what weren’t typical use cases, as we had previously established this. Many of these cases were unique, however there were clear themes that emerged as follows:
What’s the main risk?
That we allow users to misunderstand how the tool works and make presumptions about its accuracy, comprehensiveness, and output significance.
This risk has two aspects:
1) People may misinterpret the tool as a complete means of assessing the impact that research has had on policy without considering the range of other aspects that should be considered.
- Funders may use the tool to simply count how many times a research publication was cited in policy documents, rather than examining the quality of individual publications. As a result, a really innovative paper may be overlooked for those which have higher citation counts.
- A lot of projects being funded are basic research, which is known to take time to have an influence in either policy or clinical application. The risk is that funders may over-rely on the policy tools analysis of the last ten years and discontinue funding for a particular area, when it may in fact be cited in another 10 years’ time.
As a result;
- Funders may make judgements on what research is and is not ‘high impact’ and therefore make unfair funding decisions as a result. This could further perpetuate inequalities in the system, e.g. prominence of author, a favouritism of the Global North.
2) Without knowing the tool’s limitations people may misuse it and publicise incorrect results.
Specific examples of such misuse:
- A user may believe the policy tool checks against an exhaustive list of policy documents, when currently the selection is limited to a few organisations. If not aware of this limitation, they may assume that a piece of research has less reach if it not found through our tool.
- Our tool is more accurate for English-language and science-centric journal articles. Without being aware of this, the user may believe that research from other disciplines is less influential, when in fact it is just less likely to be picked up by our algorithm.
As a result;
- Instead of empowering the research and funding community with informative data, the tool will cause confusion if misused.
What are we going to do about this?
Some initial recommendations are:
- Work to make sure users have the correct high-level understanding of how the algorithms behind the tool work and don’t work.
- Put in place control processes to prevent incorrect use.
- Explain the tool’s limitations to users in clear English.
The next steps for our team is further review of the workshop outputs in terms of their likelihood and severity, which will evolve to an ethical risk mitigation strategy.
This workshop took a total of one hour to complete and we used every minute. While the conversation was full of great ideas, it is only the start. Nobody can prevent the worst from happening, we can do our best to imagine where things can go wrong and do what we can to mitigate the risks. Most people are good and mean well and this kind of thinking does happen already in ad hoc ways: ‘Oh my god, imagine if that happened. Let’s do something about that before we launch’. We are looking to make a systematic approach to spot these issues. To stay updated on our progress, be sure to follow the Wellcome Data Labs Channel.