AI machines as moral agents, The learning criterion and free will. (part 9)

H R Berg Bretz
6 min readFeb 2, 2022

--

In Part 8 I discussed the option of avoiding defining ‘agency’ altogether, but concluded that although is hard, there’s no reason to just say that it is impossible. Now it’s time to turn to the concept of ‘moral agency’ instead.

For a mission statement, see Part 1 — for an index, see the Overview.

Photo by C M on Unsplash

4. Artificial moral agency and the learning criterion

Moral agency is something more than minimal agency. Moral agency is about actions in moral situations. The distinction of accountability and responsibility makes it less demanding to define a morally accountable agent, as it does not include blame/praise or some deeper level of moral understanding. However, in 2.1 (See part 3) I deferred the problem of how programmed artifacts can have the concrete freedom needed for moral agency and now it is time to address that issue. I will argue that what I call indirect programming explains how artifacts can achieve concrete freedom, because if it cannot meet the control condition, it cannot be a moral agent.

4.1. Can an artificial agent be free enough to be a morally accountable agent?

In the past decades there has been significant progress in the field of machine learning for neural networks in AI technology. This has made it possible to produce more and more autonomous systems and has caused some to extrapolate this advancement to the conclusion that these systems can become moral agents, usually in a quite speculative manner. With Floridi and Sanders’ distinction of accountability and responsibility we can settle with the weaker claim that these systems can be morally accountable (2004, p. 366–369). Being accountable means that it is sufficient if the artifact is an agent that originated the moral act. However, there is the problem of how an artifact can meet the control condition — how can it have concrete freedom if it is programmed, if all its “choices” are predetermined instructions?

To answer this question, I will distinguish what I call direct programming from indirect programming. Imagine that Simple Car is programmed using this line of code, an example of direct programming:

if pedestrian_detected == true then brake.initiate()

which should be interpreted as: if the digital flag pedestrian_detected is true then the method brake.initiate() is activated[1]. This entails that the car does not have concrete freedom since if the flag pedestrian_detected is true, then it will initiate the brake method, otherwise it will not. In neither instance a “choice” is made, the system is better described as a set of instructions. Of course, the program of an advanced artifact is much more complicated than this. In fact, there could be as much as 100 million of lines of code in an average modern high-end car[2]. This would make the code very complicated, but if the code strictly consists of direct programming, the complexity only means that it will take much more effort to predict whether the agent will actually avoid the accident[3]. The artifact still does not have concrete freedom since it can only follow its instructions.

But there is another way to achieve the same function, indirect programming. One way to detect a pedestrian is by using a radar system. The problem with a radar system is that it can detect many things, and you do not want to mistake a pedestrian for a plastic bag flying in the wind. From a moral point of view, you might argue that it is better if the radar system would detect false positives, but if there are too many false positives the system will be inefficient, to the point where it becomes useless. Assuming that we need to distinguish plastic bags from pedestrians, how do we achieve this? The problem is that to accomplish this through direct programming, you have to first predict all the ways pedestrians are distinguished from a plastic bag and then turn this into specific instructions. And this is very hard to do. Recent development in AI technology has shown that neural networks can solve this problem to a high degree.

Here is a rough description of how a neural network works. Instead of telling the system the exact radar inputs that constitute a pedestrian and the ones that constitute a plastic bag, we let the system divide the radar input into a matrix of numerical weights. Then you add a sufficient number of hidden layers of matrices and an output of, for example, a digital value which represents either ‘pedestrian’ or ‘plastic bag’[4]. After this you “train” the system by exposing it to either pedestrians or plastic bags, and you affirm to the system if it is correct or deny if it is incorrect. When you affirm the system, it tightens the bonds between the hidden layers that resulted in the correct identification, when you deny the system it does the opposite. By doing this several thousand or million times, the neural network “learns” what input corresponds to a pedestrian/plastic bag and if the hidden layers are set up favorably, it can reach high degrees of accuracy and it can perform this task fast. The system is then indirectly programmed to learn how to distinguish the two types of objects by only programming how the neural network should learn, and then exposing it to what we want it to learn. It is indirect because it is a multi-stage process, where the programing is different at the different stages and the programmer does not have direct control over at least one stage of the process, and the “programming” at that stage usually consists of input from the environment. This is indirect programming.

If we take this technology and apply it to the artifact in the direct programming example, it would mean that we have a detection system that will tell the brake system when to brake, but the detection system is not directly programmed, instead the artifact has learned how to differentiate between a plastic bag and a pedestrian. Now the answer to the question whether the artifact will stop in time or not is “yes, if it has learned to pick out this particular token of pedestrian situation”. If the artifact is directly programmed by a human[5], then that human is accountable for the actions of the artifact because that artifact is, per Neely, completely determined by an outside source, the programmer. However, when the artifact is indirectly programmed, then the programmer is not accountable (or at least not obviously accountable), because she has not programmed how the pedestrian should be detected, she has only used (or designed) a framework that she knows has a high successful classification rate for this particular task. It depends on what the artifact “knows” whether the artifact will initiate the brakes or not.

Whether this suffices to actually make the essential difference I will elaborate more on in section 5.1, but it is at least a plausible explanation for how software artifacts can achieve the freedom necessary for that artifact to be a morally accountable agent, to meet the control condition. Next, I will compare Floridi and Sanders’ definition of artificial moral agency with my approach to a definition.

Comments:

I’m not saying that just by using neural networks it is indirectly programmed; that depends on how you use that neural network. And, remember, this is a very simplified example only to show how this might be achieved. But I do think this shows a higher tier of level of complexity which indicates that there could be more tiers and then to imagine something that isn’t direct programming doesn’t seem inconceivable. And maybe what we call consciousness is only a very high level of complexity.

Next part here!

Footnotes:

[1] In some programming languages “==” is evaluating whether a statement is true or false (while “=” means assigning a value to a variable, for example X = 1 sets the value ‘1’ to the variable ‘X’). “()” usually denotes a method (a procedure), e.g. brake.initiate() runs the procedure that initiates the brakes.

[2] https://informationisbeautiful.net/visualizations/million-lines-of-code/ (2019–05–16)

[3] I am not saying whether modern artifacts are directly programmed, only that “if” that is the case. I will expand this discussion in the second half of section 5.1.

[4] You would probably do it by distinguishing pedestrians from other objects and have more categories than specifically identifying plastic bags, but it simplifies my example somewhat.

[5] It is not usually the case that there is only one programmer that is responsible for the code. Not only could there be several hundred, it is often the case that they incorporate other software which has other programmers and so on, which could implicate many more that those directly involved with the product. Also, corporate policies, programing philosophies, industry standards could be said to affect the final product. However, these are mostly epistemological problem in trying to find out who or what supplied a specific line of code or the specific set of lines that constitute the final product, it does not change the fact that the artifact is directly programmed.

--

--

H R Berg Bretz

Philosophy student writing a master thesis on criteria for AI moral agency. Software engineer of twenty years.