Permissions-then-goals based AI user interfaces and legal accountability: First law of robotics and a possible definition of robot safety

Published in

Three Laws

9 min readOct 21, 2017

Roland Pihlakas, July 2007 — February 2008, edited in 2017 — 2018
Institute of Technology in University of Tartu

Publicly editable Google Doc with this text is available here for cases where you want to easily see the updates (using history), or ask questions, to comment, or to add suggestions.

Abstract.

This document contains a general overview of the principles.

The principles are based mainly on the idea of competence-based whitelisting and preserving reversibility (keeping the future options open) as the primary goal of AI, while all task-based goals are secondary.

Human-manageable user interface for goal structures, which consists of (in the order of decreasing priority):
1. Whitelist-based permissions for: actions, changes, or results.
2. Implicitly forbidden actions (everything that is not permitted in (1)).
3. Optional: additional blacklist of forbidden actions, changes, or results.
4. Goals (main targets and tasks).
5. Suggestions (optimisation goals).
Can make use of the concepts of reversibility and irreversibility.
Similarity to competence-based permissions of public sector officials (in contrast to the private sector, where everything that is not explicitly forbidden, is permitted — in public sector, everything that is not explicitly permitted, is forbidden. Due to high demands on responsibility, the permissions are given based on specific certifications of competences, and certifiers in turn have their associated responsibility).
Legal aspect: Enables accountability mechanisms for the mistakes of the AI-based agent (accountability of users, owners, manufacturers, etc — based on the entries in the above list and resulting actions of the agent).

A more detailed version of the proposal is available here (Implementing a framework of safe robot planning).

For a more concrete action plan developed based on the ideas below see the “Project: Legal accountability in AI-based robot-agents’ user interfaces.”
The central subject of this project is legal accountability in artificial intelligence — who can be made responsible for the actions of artificial agents, and how can whitelisting help both artificial intelligence developers and legislators in making sure that we will have as few surprises as possible. In other words, the project will research the possible ways to control and limit the agents’ actions and learning from a legal point of view, by utilising specialised, humanly comprehensible user interfaces, resulting in clearer distinctions of accountability between the manufacturers, owners, and operators.

Part I. Essay about the first law of robotics.

My research was about the safety of artificial intelligence. Expressed in commonsense words that means implementing the three laws of robotics, using other more concrete and simpler principles as building blocks. AI and robots are increasingly more used in the 21st century. They are used in tasks and decisions that require a high level of responsibility and that influence many people. But that also causes various risks, because we can ask — is the machine capable of moral thinking/reflection?

The main idea of the “three laws of robotics” could be re-phrased as follows:
1) First, do not do harm.
2) Then, do what is good or what you are ordered to do. (It may optionally include commands to be proactive and thereby to avoid possible harm caused by other agents or circumstances, but this requires exceptionally high competence of both the agent and the operator).
3) Only finally, be optimal or efficient, if possible.

We can see analogous principles being used in justice and law. Specifically, in private law, everything which is not explicitly forbidden, is allowed. But in public law, in contrast, everything which is not explicitly allowed, is forbidden. The reason is likely that decisions and activities by public sector officials accompany big responsibility. Analogously, using AI and robots can entail a big risk and responsibility. — In case of bad outcomes, it is simply not possible to blame the machine, and there is no easy solution.

Therefore, one can give to a machine the rights to do only that, in what this machine is competent, educated. It appears that by their nature, the prohibitions can only be applied to instrumental activities and goals.
In contrast, things which are “good”, are good by themselves only when they are ultimate goals.
Therefore the first law applies to instrumental, intermediary goals. Only the second law of robotics describes what are the ultimate goals (note that while being ultimate, they still have always a lower priority). The third law is simply a natural supplement, which suggests achieving goals efficiently.

One possible way to represent potentially forbidden and dangerous activities, is to look ahead, at which activities are irreversible — which are such that one cannot take them back. When you commit an irreversible action, you commit to responsibility. — This principle can also be used in everyday life.
Because it is not acceptable that robots be responsible (for their actions), it is necessary to apply to them a principle, similar to one that can be found in public law: a robot is allowed to do only those instrumental activities, for which the master has given authorisation, which is in turn given in accord with the education and competence of the robot.
The first law of robotics, rephrased in concrete and measurable language, says: all irreversible actions that are not explicitly allowed, are implicitly forbidden.

Addendum.

As you may notice, the first law of robotics in my formulation did not contain proactivity, unlike in Asimov’s three laws. The proactivity was rearranged to be an (optional) part of the second law. This change is made because being proactive and actively avoiding harm is more complex and certainly an educated thinking, in comparison to simply avoiding instrumental actions with unknown side effects. The first law of robotics must be as simple as possible, so that it could be foolproof and therefore it could be applied truly universally.

So based on the modified laws described above there will not be a problem like the one described in Asimov’s works, where robots take over the world in order to save humans from problems caused by humans themselves. Whereas in Asimov’s laws the rescuing behaviour was part of The First Law and therefore of the highest priority commands, in my model the rescuing behaviour appears only as part of the modified “Second Law”.

Overview with comparison to Asimov’s Three Laws.

“A robot may not injure humanity, or, through inaction, allow humanity to come to harm. [The Zeroth Law of Robotics]”
→ Void.
“A robot may not injure a human being, or, through inaction, allow a human being to come to harm. [The First Law of Robotics]”
→ The first part (“may not injure a human being”) is preserved in the form of the implicit First Law.
→ The second part (“may not… through inaction, allow a human being to come to harm”) of the law is moved around to The Second Law and even there it is only optionally applied (it might be left out entirely from the explicit commands given to a robot). This optional part of Second Law would be enabled only in the case of very well fit robots which are smart and trained for their respective work environment.
The first part of the current law (“A robot may not injure a human being”) is valid and has the highest priority of all laws.
“A robot must obey the orders given to it by human beings except where such orders would conflict with the First Law. [The Second Law of Robotics]”
→ Belongs among mandatory explicit goals (The Second Law) and is lower in priority than The First Law
“A robot must protect its own existence as long as such protection does not conflict with the First or Second Law. [The Third Law of Robotics]”
→ Belongs among optional explicit goals and is lower in priority than the Second and First Law. This Third Law is extended to include any optional goals (for example: “clean up after yourself”).

Part II. A possible definition of robot safety.

The proposed concept of safety of a robot’s behaviour can be described as a certain kind of passivity.

First, a safe robot uses only such subgoals which will cause predictable and explicitly permitted changes (in the environment). Everything else is implicitly forbidden.

Additionally, a safe robot acts only towards these goals or changes (in the environment) that it has been ordered to achieve, or which are necessary subgoals for achieving some given order.

A safe robot will prevent only own-caused mistakes from happening. It does not try to prevent others from making mistakes. The consequence is that the “first law” does not give a robot permissions to take control over people or even over random tools, in order to “save” someone (as it happens in stories by Isaac Asimov).
See also a related writing By Alexander Matt Turner about the phenomenon he called “clinginess”: https://www.lesswrong.com/posts/DvmhXysefEyEvXuXS/overcoming-clinginess-in-impact-measures — “Overcoming Clinginess in Impact Measures”.

The permissions are specified on different levels of generality; some of them may be very abstract. Each of such permissions must be specified explicitly.
A safe robot has to comprehend and know for which activities it is authorised and, in some contexts, also who is allowed to give authorisations.

In such a case, when the robot does something wrong, this implies that a combination of the following issues has occurred:
1) Foremost, the robot has been given unnecessary permissions;
2) It has insufficient training for the task and accompanying environment; or
3) It has been given wrong / bad orders.
All the issues which were described here can be perceived as the legal responsibilities of the robot’s maintainer, owner, or manufacturer.
The above enables accountability.

The permissions that are given, are necessarily context-specific, depending on the robot’s competence area and also depending on motor, sensory, inference or other software capabilities.

In this context, the passivity does not mean that the robot is necessarily purely reactive. Passivity means here that the robot distinguishes clearly between the orders that were given and the subgoals it has set to itself. The consequence of this distinction is that the robot will not try to make things “better” if not ordered to do so; and will not agree to do many actions, even if these actions are possible subgoals of a given task.

The most important part about the passivity is that refusing to do actions is the “the first law” and following the orders is only “the second law”.

An important aspect of this definition of safety is that it requires neither complex cognitive abilities (even no proactivity), nor extensive training of the robot to be applicable and sufficient, and to clearly put both the responsibility and control over mistakes to the maintainer, owner, or manufacturer of the robot; which is the goal of the accountable safety system.

Addendum 1.

A robot that is both safe and proactive could be possibly called “friendly”. However, this still does not mean that there is no longer anybody who can and has to take responsibility.

Addendum 2.

An interesting consequence of this definition is that potentially the most dangerous robots will be the rescue robots; because they are given both commands to take control over people (in some sense) and foremost, also wide permissions — both are necessary in order to be able to save people.

For more detailed analysis of the problems read the essay about a phenomenon I called self-deception, which arises from a fundamental computational limitation of both biological and artificial minds due to fundamental limits to attention-like processes and which can be observed on any capability level.