Computers can’t avoid automating decisions if software engineers can’t figure out what decisions are being made.
Human in the loop has become a critical part of the conversation around AI and machine learning. Many of the catastrophically negative outcomes associated with AI — or really any software that automates — come from either removing human oversight, or designing a user experience that encourages humans to neglect to provide oversight. Software should be designed as a tool that humans manipulate to make more efficient decisions. Many of the emerging AI ethics guidelines emphasize both human agency and transparency. Computers should neither take decision making power away nor hide decisions from humans.
Which all sounds great until you try to figure out when something is a decision. When computer vision labels an object as a cat or a truck … is that a decision? When anomaly detection blocks a transaction on a credit card or a login form … was that a decision? What about auto-formatting?
One of the most problematic elements of the conversation around ethics in software is that most of the action is happening away from the people who build software professionally. The research is in universities, policy groups and think tanks. Even in the rare case that a company has in-house ethics people, they are often separate from the software development lifecycle, dealing mainly in hypotheticals or advising after the fact the same way a legal team might. That’s not for any lack of interest on the part of software engineers, most of the technical people I talk to find these issues extremely compelling. It’s just difficult to fit deep reflection about ethics into a process governed by agility. Agility tells us the only way we will know what the impact will be is by trying it out, observing and adapting. It’s one thing to experiment on live users when your research question is what will make them most likely to click through? and quite another thing when you’re trying to figure out if your software might injure people.
Process and Drift
About a month ago I started playing around with some process maps our user researcher had put together. Engineering doesn’t usually give these documents more than a cursory glance. They mainly help Product figure out what sets of features and functionality would be important and how to shape the product to integrate smoothly with customers and their expectations.
But I thought maybe they could also help us strategize around human-computer interaction.
In safety science there is a concept called drift. I’ve written about it before, it’s when what is actually done diverges (or drifts) from the official process. Drift is not considered a good thing, only inevitable. Most ethnographers will tell you there’s always a little sunlight between what people say they do and what they actually do.
And yet drift rarely if ever runs the full length of a process. Instead people drift away from and back towards the official process multiple times in predictable ways. When we see drift in a process we know one of three things is true:
- There’s a blocker that keeps the official process from being followed
- The official process doesn’t accurately express the full potential variance, requiring a human operator to use their best judgment.
- Other circumstances have changed the context, making the official process unsafe compared to other options.
In other words, drift occurs in places where decisions are being made. So if we wanted to determine where human-computer interaction should accommodate decision making and the consequences of applying AI we can look at how various observed processes compare to the official process.
Traditionally, process maps attempt to consolidate drift into something close to a universal ground truth. They assume that when the de facto process differs from the formal process it does so in one main way with a series of outliers. The outliers are safe to ignore. In real life the nature of these minor variations ultimately determine the resilience of the overall system. What they are is less important than where in the official process they gather.
A Simple Process
I can demonstrate what this might look like with a simple process: sorting and documenting information found in a series of photos. A person might accomplish this on their own by repeating a few steps: look at the photo, identify people and objects, inventory it with the relevant information. Mapped out that might look like this:
Now if I wanted to automate this process using software and I wanted to avoid having a computer make a decision for a human, what areas should I automate and what areas should I leave manual? Both algorithms for facial recognition and object detection could potentially have high error rates, which would make relying on a computer to do them for the human more problematic. The line between decision and non-decision is determined by the impact of a mistake, which feels unhelpful.
But what if instead of looking at the impact of mistakes, we looked at drift in the process?
For example, we might have a photo where it is unclear whether the people in it are known people are not. Perhaps the face is obscured. Perhaps they are far away in the shot. Perhaps the quality of the photo is not good. In those situations the human operator must add steps to the official process in order to get the job done.
Also…. how is a “relevant” object defined exactly? The process doesn’t make this clear. It could be that there’s some authoritative source of objects to be on the lookout for. Or it could be assumed that relevant is obvious. If there is a list of relevant objects, how should the human operator behave when they encounter something omitted from the list but similar to other relevant objects? For example, if our “relevant objects” list consists of items like pens, pencils, and markers does side walk chalk count? Does a calligraphy brush?
Virtually any criteria will have gray areas. If our process is trying to sort photos of cats, someone has to decide if tigers count as cats. Or if cartoons of cats should count.
If we try to represent these drifts on our process map we might end up with something like this:
Already we’ve found two forms of drift. In the first case, the quality of the photo blocks us from completing the task and the process can’t move forward until we’ve received either a better quality version of the photo or instructions from management to move on.
In the second case, the existing process does not accurately represent all the potential states and the human operator has to make a value judgment given the context of the gray area and the objectives of the process itself.
How does thinking about process this way change the software we might design to assist with this process? When all you can see is the ideal, automating that ideal becomes the natural way to think about technology. When you start to tease out and visualize the drift, other opportunities become obvious. We could filter the photos by their quality and ability to identify faces or objects, for example, thereby making the task of the human operator more efficient or minimizing variance in how/when people ask for help. We could also construct a tagging system so that the burden of defining significance is with the human operator but those decisions can be redone easy by changing configuration around what data is aggregated.
With the drift factored in, the focus of our efforts shifts from automation to problem solving. What kept the process from working efficiently and how can we solve for that?
Type 1 and Type 2 Thinking
There’s another way to look at the drift in our process map. The act of identifying faces and objects is Daniel Kahneman’s Type 1 pattern matching. The actions taken as part of drift are closer to thoughtful Type 2 analysis. In my last blog post on this topic, I talked about how most if not all of machine learning and AI is Type 1 thinking in disguise. And that computers and humans have inversely proportional relationships with Type 1 and Type 2 thinking. Type 1 is easy for humans, resource intensive for computers. Type 2 is resource intensive for humans but easy for computers.
We also know that Type 1 thinking tends to be error prone and so human beings tend to inject Type 2 thinking to check their assumptions from time to time. Most AI programs, however, are not designed to check their Type 1 thinking with Type 2 thinking. If such a check does happen, it comes from the human operator interacting with the program.
For that reason, it would also be really interesting to trace the Type 1 and Type 2 thinking through the existing process
Were we to automate this process as it is we would likely remove those checks. Facial recognition would identify the faces and object detection would identify the objects, then a computer would do the count. There’s a healthy margin for failure there. If this process doesn’t need a high degree of accuracy it’s probably fine, but if it does it seems clear that the human is out of the loop.
If the human operators have thousands or millions of photos to sort, the temptation to automate the Type 1 thinking increases. But as long as we understand that’s what we’re doing we can focus our efforts on designing the software to prompt the human operator for Type 2 thinking. For example, we could require the human to review algorithmic sorting below a certain confidence level, skimming the easiest photos off of the queue and allowing the computer to sort them.
Finding the Decisions
We assume that it’s easy to figure out where decisions are being made in a process because we make thousands of decisions a day without ever realizing it. When AI is brought into the picture we don’t even consider that we’re outsourcing decision making authority to software until something goes wrong and we have to figure out how such a mistake could be made.
By trying to figure out where drift is in a process, we highlight places where the process can’t work unless a human operator makes a decision. Being able to identify and call out these areas often changes our understanding of how to apply technology to improve the process.