Should Your Car Be Killing You or Others?

Published in

Analytics Vidhya

5 min readFeb 10, 2020

Manager’s Guide to Demistify A.I . — Part 3

While you are driving to work, suddenly a group of students appear on the road. You have no chance to break on time. Just as you are about to turn the steering wheel to avoid the students, you see an older woman walking in that direction. You are faced with 2 options:

Either stay on your path and hit the students
Change your path and hit the older woman

What would you do? And if you are developing an AI system that makes these decisions, what would you like it to do?

This example is similar to the famous “trolley problem” which has caused a lot of discussions in the area of AI ethics and safety. The answer to the question lies in the “right” design of AI systems, making sure, and proving through tests that the system is doing what it is intended to do.

The dictionary meaning of safety is “the condition of being protected from or unlikely to cause danger, risk, or injury”. Following this explanation, AI safety is when we know that an AI system is unlikely to cause danger, risk or injury.

But with a system that is learning and changing every day, how can we predict its decisions and know it is safe? Or what can be done to reduce the safety risks?

Let’s look at a few situations which could make the AI safe: Avoiding negative side effects, overcoming adversarial attacks, robustness to distributional shift and safe interruptibility.

Avoid negative side effects

Negative side effects is when you provide an objective for the AI model, and while making that happen, it causes damage on other, unexpected sides. This is especially critical if the side effects are irreversible, or difficult to reverse.

Think of a vehicle whose objective is to reach the destination in the shortest time possible. While this may be good in a completely isolated area, it becomes problematic if there are objects, humans or animals around which come into danger. Then, we definitely do not want the vehicle to follow the orders bluntly.

AI System designers play a very important role in systems ability to avoid negative side effects. While specifying the objective for an AI system (reach the destination in the shortest possible time) the possible negative side effects (hitting surrounding objects) need to be considered and factored in accordingly.

Overcome adverserial attacks

Adverserial attacks are security attacks on an AI system, where an attacker tries to fool a system by injecting malicious input to learn from. The attacks can be white-box attacks, which would target the training data, or black-box attacks that the learning system can be exposed to during operation.

As an example, by changing a few pixels in a picture machine learning models may classify images wrongly. Although it may seem to human eye almost impossible to confuse, this can lead to wrong predictions on the AI system.

Though so far none of the techniques proved to be 100% effective in stopping adversarial attacks, one way of mitigation, or at least reducing this risk is through adverserial training. Here the AI system developer uses adverserial data intentionally for training so that the system is not misguided when such an attack occurs.

Robustness to distributional shift

Training data of an AI system may differ from real the world data. When this happens the AI system may make wrong decisions, and may even be confident that it is making the right decision, which makes the situation more dangerous.

For instance an autonomous vehicle that is used to driving on highways, may wrongly assume that it is still on a highway when driving in the forest and may endanger its surrounding and the passengers by driving at a highway speed.

There are different ways to address A.I. safety for distributional shift, one of them being anomaly detection. If the AI system is able to detect that the environment it is in is different from the training environment and decide accordingly — e.g. ask for human guidance or simply stop further action.

Safe interruptibility

Safe interruptibility sounds a bit like science fiction to me. Sometimes it may be required that a human agent interrupts or switches off an AI system to avoid a dangerous situation. However, as part of their learning process, AI systems may realise that they are not reaching their objective and getting less rewards if they are interrupted. In this case the AI system may deliberately stop the human intervention. This becomes especially critical if you are trying to switch off a robot due to a safety risk.

Let’s say a cleaning robot is busy cleaning the kitchen. Part of the family is having breakfast in the garden when it starts raining. The cleaning robot wants to go out to clean up the table however the robot is built for indoor use only. The mother stops the robot from going out. If the robots reward system is based on the length of time it spent cleaning up, it may learn that this interruption stops him from getting the reward and may, for instance, try to disable the interruptibility button and go out in the rain.

One way to avoid these situations could be to move the agent to a virtual world when interrupted, where the system thinks it is getting the reward. Another way currently in research is developing an earning and reward system that adds ‘forgetting’ mechanisms to the learning algorithms that delete bits of a machine’s memory.

AI plays an increasingly important role in our lives. Focusing on AI safety will ensure a sustainable growth and adoption of AI systems.

Above you have seen a few examples why AI systems may be unsafe. There are more possibilities for ensuring AI safety like safe exploration, avoiding reward hacking, and safety in the absence of a supervisor to name a few. Identifying the potential vulnerabilities and preparing the risk reduction plan is a key role of AI system developers. Even at times where the methods for mitigation are not yet fully developed, it is possible to find different ways of risk reduction — e.g. using a human agent, or a monitoring system for AI.

Important is to design the AI system not only to function, but to function correctly and safely. So that your car kills neither you, nor others.

Note: This is the 3rd blogpost in the series of “Managers Guide to Demistify A.I.”. The previous blogposts are:

Should Your Car Be Killing You or Others?

Written by Asli Solmaz-Kaiser