Distilling the debates around Responsible AI and where responsibility lies

Published in

Wonderful World of Data Science

6 min readMar 1, 2024

Photo of seat belt on an airplane, by Daniel Schwen (a human)

I am writing this article as a practitioner who works both on and with “AI”, but who has a deeper interest in the issues around the constantly evolving landscape of machine intelligence. Often when I read the discourse in this area, I feel frustrated at the lack of rigour and precision in how views are expressed. This article attempts to address this by identifying four areas where the discourse is particularly vague, and where by being more explicit about our beliefs and values, we can have more useful discussions that allow us to take decisions on our solutions.

We lack clarity and precision on what kind of “intelligence” we are talking about or what behaviour we should be expecting.

While there exist many definitions of “artificial intelligence”, we can identify two themes:

The exhibition or emulation of human-like cognitive capabilities such as reasoning, learning, planning and creativity;
Learning and adaptation through data, i.e. the capabilities of the model or system are in part governed by the data it is presented with (just as humans learn from experience).

As in the case of “intelligence” in general, we may never reach consensus on a precise definition, and it is perhaps more useful to talk about specific capabilities. Rather than saying that an individual is intelligent, we might instead say that they are “good at reasoning logically”, “express themselves eloquently”, or “make connections between concepts quickly”. In the same vein, when we talk about artificial intelligence, being more precise about what capabilities we are referring to can help us have more meaningful discussions about what kinds of behaviour we would expect and desire for our use case.

We lack clarity and precision on what “safe” means for our use case.

Just as the capabilities and behaviours we desire are determined by our use case, the bounds within which these must fall are also determined by our use case and we need to specify:

Which behaviours we definitely don’t want.
Which behaviours we must have.

We haven’t had an open and rigorous discussion about the trade-offs we have to make between performance and safety.

Following on from 1 and 2, we often have to make some tough decisions around the different requirements and thresholds we have for the system’s capabilities and its safety. For example, what threshold should we choose for a chatbot’s confidence level for so they give a response, rather than “I don’t know”; if the threshold is very low, then there will be more erroneous responses/hallucinations/biased responses, but if the threshold is too high, the performance of the system will suffer and the system will look “stupid”.

We are fooled by intelligent-looking behaviour and expect too much.

Often we have unrealistic expectations of a model or system when it produces impressive looking or impressive sounding outputs (as in the case of generative language and image models), and we expect it to also have other capabilities or characteristics typical of beings that produce such outputs, such as logical reasoning. This is like expecting someone who can speak intelligently about a given topic to be an expert on every topic or pick up a new topic they are not well-versed in.

Who is responsible?

By being more explicit and specific in relation to the areas of discourse outlined above, we can also have more sensible debates about responsibility.

For example, I tend to feel a great deal of responsibility for any system or model that I have released into the world to be used by humans, so I tend to do the following:

Decompose the problems I want to solve into individual capabilities and adopt the most transparent solution possible for each of the individual capabilities.
By “the most transparent solution possible”, I mean that I choose the model that I understand the best (which might be achieved either from understanding the workings of the model itself or having observed its behaviour through rigorous experimentation). This is often the “stupidest” model in the sense that it only addresses a single task, but because I can rely on it to perform the way I expect it to with respect to this task, I feel far more confident adopting it and building other things on top of it.
Perform rigorous experiments on both the individual capabilities and on the system as a whole, with human-guided or ground-truth-based evaluation.
The process of designing the experiments itself forces me to be explicit about the trade-offs I need to make to between safety and performance. By applying human-guided or ground-truth-based evaluation, I ensure the standard I am evaluating the system on is the one that it has to meet when it is released out into the world.
Choose high thresholds and build in extra checks for the higher stakes aspects of my system.
Sometimes the outputs from AI models come with a level of confidence; by setting the acceptance threshold to be high for a given use case, fewer failures will occur. Of course, this comes at the expense of the response rate since we reject more outputs, but usually there is a sweet spot (or a best imperfect solution) that gives us decent performance, i.e. where a correct response is given most of the time. In many cases though, AI models do not come with a level of confidence, or it is not clear that we should trust it. In these cases, I build in extra checks. Again, this forces me to be explicit about the precise behaviours I desire and permit.

To take a human analogy, my approach is to treat systems and models that I do not fully understand or have not yet interacted with as if they were new colleagues that I had not previously worked with. Just as I would hold back from trusting a new colleague with high stakes tasks on day 1 of meeting them, the kinds of tasks (both in terms of domain knowledge and complexity) I would entrust an AI model or system with would depend very much on what I had gleaned from experimenting rigorously with tasks I need them to perform reliably on. And when it comes to the attribution of responsibility, I apply a similar rationale.

If I gave a new colleague a task without checking that they would be equipped to tackle it, or if I didn’t build in provisions for if they failed to deliver, then I would say the bulk of the responsibility for failure falls on me; I would not blame their parents, the school they were educated in, or their upbringing. In the same way, when an AI model that I have chosen to adopt fails when I haven’t made the time to understand it or diligently tested it, I would tend to feel responsible. But if the new colleague spontaneously decided to shred all the company’s documents or vandalise the office, then I might feel less responsible (while still perhaps acknowledging that we should have done more due diligence on their background. In the same way, if a model does something that its makers/trainers claim that it should not do, then I might also have feel they should shoulder some of the blame.

Before signing off, I want emphasise that while I have shared some of my own views/position with respect to the areas of discourse identified in this article, these should not be taken as normative (I’m simply telling you where I stand). Instead the reader should focus on the areas of discourse themselves and instead ask themselves, “what do I stand?” with respect to each issue within the context of their own use case or problem.