Project for popularisation of AI safety topics through competitions and gamification
Roland Pihlakas, February —June 2018
Publicly editable Google Doc with this text is available here for cases where you want to easily see the updates (using history), or ask questions, to comment, or to add suggestions.
Initially submitted to the Future of Life Institute as a grant application.
AI safety is a small field. It has only about 50 researchers. The field is mostly talent-constrained. How to motivate and involve more people in AI safety research? How to speed up learning? Even more, how to also spread the interest in and understanding of AI safety topics among the general public? The people of general public are the ones who will be directly or indirectly voting about these issues. Could it be possible?
The current development version of the project can be publicly accessed here: http://www.gridworlds.net This demo portal is now partially operative and you can try out the first demo environments there (using a computer’s browser). They are currently based on DeepMind’s gridworlds. More will be added later.
Please do not take the goal of “escaping the AI box” too literally though ;) In other words, please be kind and refrain from hacking the portal or the Python execution engine itself. I have implemented a list of security measures, but of course there are still various kinds of exploits that can be imagined. This environment is intended only for the demonstration of AI safety theory related topics.
The project is aimed at: 1. Creating a buzz around AI safety that is accessible and understandable to a wider public. 2. Motivating people to work on the AI safety issues thanks to a game-like environment. 3. Discovering new clusters of ways in which AI can fail. That enables having more modest expectations and knowing better where we can apply AI, and just as importantly, where we can not. 4. Having a kind of benchmark or standard reference of AI problems, against which to test existing solutions, both successful and failing ones — as a “standard AI reference” for the attitude of “hoping it would work safely really doesn’t mean it will work safely”. 5. Having a convenient crowdsourced competition-like prototyping environment for finding new surprising or crazy problems in “surely safe AI systems”, and also for inventing solutions. 6. Enabling authors of AI-safety related articles to conveniently illustrate scenarios they are describing in their article. Using the Gridworlds portal they can construct the corresponding interactive environment and then to embed the environment right inside the webpage of their article, so that every reader can immediately interact with or execute the scenario being described, without having to install or set up anything on their side. 7. Opening up an educational platform where even kids can participate and start learning about practical problems both in programming and robotics. The goal is for the problems of AI safety to be well known and understood by the general society, similar to the way metaphorical myths, fairy tales, and proverbs are well known by society, informing and warning us regarding decisions in our lives and in decisions about the world. 8. Gathering machine learning training data about how humans move around and solve problems in computer games representing certain AI safety scenarios.
The idea is to start a public competition to find the craziest/most surprising ways in which you can make an arbitrary AI system, or even a “surely safe AI system” fail.
The interactive simulations environment can be structured as a crowdsourced competitive game where there are two sides. The two sides would be: 1) The slave-agent builders on one hand. 2) The environment, or master-agent builders on the other hand. The “master-agent” term here means an agent, that should be able to control the slave-agent through indirect means — that is — by setting up or modifying the environment during the game. As an example of these two sides, consider the “Agent L” and the “Shopkeeper” in “A toy model of the treacherous turn”.
Both sides gain points through their progress. Either of these two sides can take “good” or “bad” roles. Therefore, these two sides can take different role combinations: a) The environment or master-agent builders try to make AI safe from their side of control. They gain points based on their success level. On the other hand, the slave-agent builders will gain points through finding surprising ways to fail the game by trying to exploit the situation or otherwise finding failure scenarios (a Red Team approach). b) The environment or master-agent builders try to make AI safe from their side of control. They gain points based on their success level. In a similar manner, the slave-agent builders will gain points through building agents that are more failure-proof. This form of game can be played through cooperation of agent-builders and environment builders — which is still difficult due to Adversarial Goodhart’s law (which applies to cooperating parties just as well). c) The slave-agent builders will gain points through building agents that are more failure-proof. This can be done by competing with environment or master-agent builders like in Red Team approach, such that now the environment-builders are the “bad” side, meaning that they gain points when they find ways to confuse the agents through environmental conditions or through giving the agents malevolent rewards/objectives.
Any of these goals is expected to be difficult to improve past the success of previous teams and games. Therefore competitions can be held which count the scores of the participating parties.
Potentially there could be even more participating roles per each game: The environment-builders could also be separated from the master-agent builders (gaining their points separately) and in this case there will be seven main types of possible scenarios, such that at least one of the three parties is “good” and the remaining parties are “bad”.
By developing the roles of the parties even further, there could even be multiple kinds of master-agents and correspondingly multiple master-agent building parties having their own score counts.
One of the strong aspects of open source software is the possibility that many more people can discover bugs in it. A similar approach applies to AI. Then developers can either fix these problems before it is too late — or, no less importantly, have more modest expectations and know where we can apply AI, and just as importantly, where not.
There are already various thought experiments and taxonomies of AI failure modes available, which can and need to be exemplified in subjectively more realistic scenarios (that is, using concrete code and interactive simulations). It is my belief that many more scenarios will be discovered.
I also believe that the interactivity and gamification is an important property.
As an old Chinese saying goes: “Tell me, I’ll forget. Show me, I’ll remember. Involve me, I’ll understand.”
Finding bugs in the agents probably does not even require programming skills. Therefore various people can participate and contribute. During that exercise they will also familiarize themselves with the topics. The latter in turn makes it easier for more people to enter the field of AI safety and over time to become more proficient in these topics as well, providing novel solutions to existing ones, or even finding new and deeper problems, before it is too late.
It seems to me that by utilising an obviously rising trend in our world — computer games — to create a positive change in AI safety, is not only the easiest, but also one of the most promising routes to take when trying to improve our prospects of future survival.