Current AI Safety 101

Current Issues and Dangers

5 min readMay 2, 2019

This article is adapted from the middle section of a previous AI safety article by AmeliorMate CEO Katie Evanko-Douglas.

The purpose of this article is to help the non-technical understand the landscape of dangers involved in artificial intelligence and machine learning (AI/ML) systems. Though such technological breakthroughs offer amazing opportunities for human advancement, they also come with challenges.

Many of the risks posed by AI/ML systems now and in the future stem from unintended consequences.

Unintended consequences are a two-pronged issue:

When systems don’t perform as expected.
When systems are used maliciously.

Let’s look first at what happens when systems behave in harmful ways accidentally.

The non-profit OpenAI has been at the vanguard of this type of research. Their 2016 paper detailing concrete problems in AI safety is a good paper for non-technical people because it explores the issues through a fictional robot designed to clean office buildings, which is easy to visualize and understand.

The main challenge is helping the systems avoid accidents which the paper defines broadly “as a situation where a human designer had in mind a certain (perhaps informally specified) objective or task, but the system that was designed and deployed for that task produced harmful and unexpected results.”

The paper outlines five main ways accidents would happen:

Avoiding Negative Side Effects: How can we ensure that our cleaning robot will not disturb the environment in negative ways while pursuing its goals, e.g. by knocking over a vase because it can clean faster by doing so? Can we do this without manually specifying everything the robot should not disturb?
Avoiding Reward Hacking: How can we ensure that the cleaning robot won’t game its reward function? For example, if we reward the robot for achieving an environment free of messes, it might disable its vision so that it won’t find any messes, or cover over messes with materials it can’t see through, or simply hide when humans are around so they can’t tell it about new types of messes.
Scalable Oversight: How can we efficiently ensure that the cleaning robot respects aspects of the objective that are too expensive to be frequently evaluated during training? For instance, it should throw out things that are unlikely to belong to anyone, but put aside things that might belong to someone (it should handle stray candy wrappers differently from stray cellphones). Asking the humans involved whether they lost anything can serve as a check on this, but this check might have to be relatively infrequent — can the robot find a way to do the right thing despite limited information?
Safe Exploration: How do we ensure that the cleaning robot doesn’t make exploratory moves with very bad repercussions? For example, the robot should experiment with mopping strategies, but putting a wet mop in an electrical outlet is a very bad idea.
Robustness to Distributional Shift: How do we ensure that the cleaning robot recognizes, and behaves robustly, when in an environment different from its training environment? For example, strategies it learned for cleaning an office might be dangerous on a factory workfloor.

These problems might seem silly when thinking about a robot cleaning an office building which has relatively low stakes (unless the robot decides to use a cleaner that destroys all of your office’s computer screens because it deems it the most efficient cleaner and you don’t realize it until you come in the next morning), but what happens when AI/ML systems are running larger, higher stakes things like driving cars and planes and regulating the flow of traffic? Or when they’re performing more personal tasks like caring for children or the elderly?

Because AI/ML systems will continue to be deployed in more and more sensitive areas of our lives and societies, it’s important to spend time thinking deeply about these problems and dedicating research dollars to them.

Much of the technical work and research being done on AI safety relates to solving the above-mentioned problems. But there’s another side to AI safety that’s become more popular in recent years and that is anticipating the ways various AI/ML systems might be used by malicious actors.

One problem is that most AI/ML researchers are good people who got into the field because they wanted to help the species move forward in positive and impactful ways. Thus, their brains are not wired to think up ways to harm people. This can make it difficult to identify and think through all malicious uses.

For example, the dangers of being able to create fake yet realistic human faces weren’t thoroughly debated before the technology became available, though it does pose dangers. For example, before such convincing deep fake technology existed, when Russia wanted to make fake accounts at scale for the purposes of election interference and general hybrid warfare campaigns, two of the easiest way to do it were to use stock photos or to copy the identifies of real people, which is identity theft and a crime. When they can generate fake people at scale it could be harder to catch because the photos can’t be checked against a database of easily identifiable stock photos and the lack of identity theft takes away the obvious and litigatable first crime in the process.

OpenAI has recently generated a discussion about responsible disclosure and thus dissemination of technology with their new “large-scale unsupervised language model which generates coherent paragraphs of text, achieves state-of-the-art performance on many language modeling benchmarks, and performs rudimentary reading comprehension, machine translation, question answering, and summarization — all without task-specific training.”

Our model, called GPT-2 (a successor to GPT), was trained simply to predict the next word in 40GB of Internet text. Due to our concerns about malicious applications of the technology, we are not releasing the trained model. As an experiment in responsible disclosure, we are instead releasing a much smaller model for researchers to experiment with, as well as a technical paper.

Issues surrounding malicious use and disclosure for research purposes are not going to be resolved any time soon, but it’s a step in the right direction for OpenAI to start that conversation.

For example, might this technology, in the hands of Russian operatives, make it possible to produce convincing disinformation at scale without the need to recruit native English speakers?

Disclosing more information is what Jeff Bezos might describe as a one-way door, a decision that is permanent. Withholding information is a two-way door, a decision that can be reversed.

If a decision is reversible, we can make it fast and without perfect information. If a decision is irreversible, we had better slow down the decision-making process and ensure that we consider ample information and understand the problem as thoroughly as we can.

Many of the issues surrounding AI/ML are one-way doors, which is why they require robust discussion from the start.

Current AI Safety 101

Current Issues and Dangers

Written by AmeliorMate