Project for executable and interactive simulations of AI safety failure scenarios
Roland Pihlakas, February 2018
Publicly editable Google Doc with this text is available here for cases where you want to easily see the updates (using history), or ask questions, to comment, or to add suggestions.
Initially submitted to the Future of Life Institute as a grant application.
The current development version of the project can be publicly accessed here:
The project is aimed at:
1. Creating a buzz around AI safety.
2. Motivating people to work on the AI safety issues thanks to a game-like environment.
3. Discovering new ways/clusters of ways in which AI can fail. That is, to have more modest expectations and to know better where we can apply AI, and just as importantly, where not.
4. Having a kind of benchmark or standard reference of problems, against which to test existing solutions, both successful and failing ones — as a “standard AI reference” for “hoping it would work safely really doesn’t mean it will work safely”.
5. Having a convenient prototyping environment for finding new solutions.
The idea is to start a public competition to find the craziest/most surprising ways in which you can make an arbitrary AI system, or even a “surely safe AI system” fail.
The interactive simulations environment can be structured as a crowdsourced competitive game where there are two sides (a Red Team approach):
- Ones who try to make AI safe.
- Others who try to exploit the AI or otherwise find failure scenarios.
Both sides gain points through their progress.
One of the strong aspects of open source software is the possibility that many more people can discover bugs in it. Similar approach applies to AI. Then developers can either fix these problems before it is too late — or, not less importantly, have more modest expectations and know where we can apply AI, and just as importantly, where not.
There are already various thought experiments and taxonomies of AI failure modes available, which can and need to be exemplified in subjectively more realistic scenarios (that is, using concrete code and interactive simulations).
I believe that many more scenarios will be discovered.
I believe the interactivity and gamification is an important property.
An old Chinese saying goes:
“Tell me, I’ll forget.
Show me, I’ll remember.
Involve me, I’ll understand.”
Finding bugs in the agents probably does not even require programming skills. So various people can participate and contribute. During that exercise they also make themselves more familiar with the topics. The latter in turn makes easier for more people to enter the field of AI safety and then over time to become also more professional in these topics, providing novel solutions to existing ones, or even finding new and deeper problems, before it is too late.