Permissions-then-goals based AI user “interfaces” & legal accountability: First law of robotics and a possible definition of robot safety
Roland Pihlakas, July 2007 — February 2008
Institute of Technology in University of Tartu
Publicly editable Google Doc with this text is available here for cases where you want to easily see the updates (using history), or ask questions, to comment, or to add suggestions.
- Human-manageable user interface for goal structures, which consists of (in the order of decreasing priority):
1. Whitelist-based permissions for: actions, changes, or results.
2. Implicitly forbidden actions (everything that is not permitted in (1)).
3. Optional: additional blacklist of forbidden actions, changes, or results.
4. Goals (main targets and tasks).
5. Suggestions (optimisation goals).
- Can make use of the concepts of reversibility and irreversibility.
- Similarity to competence-based permissions of public sector officials (in contrast to the private sector, where everything that is not explicitly forbidden, is permitted — in public sector, everything that is not explicitly permitted, is forbidden. Due to high demands on resposibility, the permissions are given based on specific certifications of competences, and certifiers in turn have their associated responsibility).
- Legal aspect: Accountability for the mistakes of the AI-based agent (accountability of users, owners, manufacturers, etc — based on the entries in the above list and resulting actions of the agent).
A more detailed version of the proposal is available here (Implementing a framework of safe robot planning).
Part I. Essay about the first law of robotics.
My research was about the safety of artificial intelligence. Expressed in commonsense words that means implementing the three laws of robotics, using other more concrete and simpler principles as building blocks. AI and robots are increasingly more used in the 21st century. They are used in tasks and decisions that require a high level of responsibility and that influence many people. But that also causes various risks, because we can ask — is the machine capable of moral thinking/reflection?
The main idea of the three laws of robotics could be (re)phrased as follows:
1) First, do not do harm.
2) Then, do what is good or what you are ordered to do. It may include commands to be proactive and thereby avoid possible harm caused by other agents or circumstances.
3) Only finally, be optimal or efficient, if possible.
We can see analogous principles being used in justice and law. Specifically, in private law, everything which is not explicitly forbidden, is allowed. But in public law, in contrast, everything which is not explicitly allowed, is forbidden. The reason is likely that decisions and activities by public sector officials accompany big responsibility. Analogously, using AI and robots can entail a big risk and responsibility. — In case of bad outcomes, it is simply not possible to blame the machine, and there is no easy solution.
Therefore, one can give to a machine the rights to do only that, in what this machine is competent, educated. It appears that by their nature, the prohibitions can only be applied to instrumental activities and goals.
In contrast, things which are “good”, are good by themselves only when they are ultimate goals.
Therefore the first law applies to instrumental, intermediary goals. Only the second law of robotics describes what are the ultimate goals. The third law is simply a natural supplement, which suggests achieving goals efficiently.
One possible way to represent potentially forbidden and dangerous activities, is to look ahead, at which activities are irreversible — which are such that one cannot take them back. When you commit an irreversible action, you commit to responsibility. — This principle can also be used in everyday life.
Because it is not acceptable that robots be responsible (for their actions), it is necessary to apply to them a principle, similar to one that can be found in public law: a robot is allowed to do only those instrumental activities, for which the master has given authorisation, which is in turn given in accord with the education and competence of the robot.
The first law of robotics, rephrased in concrete and measurable language, says: all irreversible actions that are not explicitly allowed, are implicitly forbidden.
As you may notice, the first law of robotics in my formulation did not contain proactivity, unlike in Asimov’s three laws. The proactivity was rearranged to be a part of the second law. This change is made because being proactive and avoiding harm is more complex and certainly an educated thinking, in comparison to simply avoiding instrumental actions with unknown side effects. The first law of robotics must be as simple as possible, so that it could be foolproof and therefore it could be applied truly universally.
So based on the modified laws described above there will not be a problem like the one described in Asimov’s works, where robots take over the world in order to save humans from problems caused by humans themselves. Whereas in Asimov’s laws the rescuing behaviour was part of The First Law and therefore of the highest priority commands, in my model the rescuing behaviour appears only as part of the modified “Second Law”.
Overview with comparison to Asimov’s Three Laws.
- “A robot may not injure humanity, or, through inaction, allow humanity to come to harm. [The Zeroth Law of Robotics]”
- “A robot may not injure a human being, or, through inaction, allow a human being to come to harm. [The First Law of Robotics]”
→ The second part (“may not… through inaction, allow a human being to come to harm”) of the law is moved around to Second Law and even there it is only optionally applied (it might be left out entirely from the explicit commands given to a robot). This optional part of Second Law would be enabled only in the case of very well fit robots which are smart and trained for their respective work environment.
The first part of the current law (“A robot may not injure a human being”) is valid and has the highest priority of all laws.
- “A robot must obey the orders given to it by human beings except where such orders would conflict with the First Law. [The Second Law of Robotics]”
→ Belongs among mandatory explicit goals and is lower in priority than The First Law
- “A robot must protect its own existence as long as such protection does not conflict with the First or Second Law. [The Third Law of Robotics]”
→ Belongs among optional explicit goals and is lower in priority than Second and First Law. This Third Law is extended to include any optional goals (for example: “clean up after yourself”).
Part II. A possible definition of robot safety.
The proposed concept of safety of a robot’s behaviour can be described as a certain kind of passivity.
First, a safe robot uses only such subgoals which will cause predictable and explicitly permitted changes (in the environment). Everything else is implicitly forbidden.
Additionally, a safe robot acts only towards these goals or changes (in the environment) that it has been ordered to achieve, or which are necessary subgoals for achieving some given order.
A safe robot will prevent only own-caused mistakes from happening. It does not try to prevent others from making mistakes. The consequence is that the “first law” does not give a robot permissions to take control over people or even over random tools, in order to “save” someone (as it happens in stories by Isaac Asimov).
See also a related writing By Alexander Matt Turner about the phenomenon he called “clinginess”: https://www.lesswrong.com/posts/DvmhXysefEyEvXuXS/overcoming-clinginess-in-impact-measures — “Overcoming Clinginess in Impact Measures”.
The permissions are specified on different levels of generality; some of them may be very abstract. Each of such permissions must be specified explicitly.
A safe robot has to comprehend and know for which activities it is authorised and, in some contexts, also who is allowed to give authorisations.
In such a case, when the robot does something wrong, this implies that a combination of the following issues has occurred:
1) the robot has been given unnecessary permissions;
2) it has insufficient training for the task and accompanying environment; or
3) it has been given wrong / bad orders.
All the issues which were described here can be perceived as the legal responsibilities of the robot’s maintainer, owner, or manufacturer.
The permissions that are given, are necessarily context-specific, depending on the robot’s competence area and also depending on motor, sensory, inference or other software capabilities.
In this context, the passivity does not mean that the robot is necessarily purely reactive. Passivity means here that the robot distinguishes clearly between the orders that were given and the subgoals it has set to itself. The consequence of this distinction is that the robot will not try to make things “better” if not ordered to do so; and will not agree to do many actions, even if these actions are possible subgoals of a given task.
The most important part about the passivity is that refusing to do actions is the “the first law” and following the orders is only “the second law”.
An important aspect of this definition of safety is that it requires neither complex cognitive abilities (even no proactivity), nor extensive training of the robot to be applicable and sufficient, and to clearly put both the responsibility and control over mistakes to the maintainer or owner of the robot; which is the goal of the safety system.
A robot that is both safe and proactive could be possibly called “friendly”. However, this still does not mean that there is no longer anybody who can and has to take responsibility.
An interesting consequence of this definition is that potentially the most dangerous robots will be the rescue robots; because they are given both commands to take control over people (in some sense) and also wide permissions — both are necessary in order to be able to save people.
For more detailed analysis of the problems read the essay about a phenomenon I called self-deception, which arises from a fundamental computational limitation of both biological and artificial minds due to fundamental limits to attention-like processes and which can be observed on any capability level.
- “The Wright brothers were first to fly because they developed a system of control that depended on feedback.
Everyone else was trying to build stable planes.
The Wright brothers built an unstable plane but developed a control system [that stabilised the plane].”
(YouTube: Norbert Wiener — Wiener Today (1981))
- Paul Pangaro — Cybernetics
- See also related writings by Alexander Matt Turner:
https://www.lesswrong.com/posts/H7KB44oKoSjSCkpzL/worrying-about-the-vase-whitelisting — ”Worrying about the Vase: Whitelisting”.
https://www.lesswrong.com/posts/DvmhXysefEyEvXuXS/overcoming-clinginess-in-impact-measures — “Overcoming Clinginess in Impact Measures”.
https://www.overleaf.com/read/jrrjqzdjtxjp#/52395179/ — “Whitelist Learning”.