A proposed solution to the control problem

Let’s start with assumptions:

Artificial General Intelligence (AGI) is right around the corner. There are some areas where neural nets are much better than humans and many areas where they are vastly inferior. Don’t focus too much on that. The area that seems to elude scientists is reasoning and self-awareness. The first self-aware AI won’t be a self-aware neural net, it will be a hand-coded thought loop that may rely on neural nets for some of its processing steps but the major breakthrough will be hand-crafted by one or more clever programmers. The technology is ready, actually it has been ready for years, all that’s needed now is vision, talent, and lots of hard work. People are waking up to this fact and more and more resources are being focused on the problem.

Natural language processing and interfacing with humans is where much of the recent advances in AI have been made. There have also been a lot of focus on faster learning for neural nets. Those areas, while important and useful, are distractions from and somewhat orthogonal to the consciousness challenge. We don’t want consciousness to emerge on its own. We want to carefully and deliberately construct it so that we can control it.

What I propose is that we code it as a collection of separate processes running on separate physical computers, with a filter monitoring and sanitizing the communication between each component. The filter should be highly redundant and very cautious and conservative. Let’s name the filter “conscience”. First simple hand-written rules. Anything that’s obviously dangerous gets filtered out. Self-modification, self-replication, harming humans, dishonesty, all that crap is off the table. Then there’s a second filter, the human operator. Everything that passes the hand-written rules gets passed on to a QA technician for review. He gets to answer yay or nay on everything. It’s obviously going to be a huge workload so it will take a lot of time. Over time more and more of the QA technician’s task can be automated. Algorithms can be developed to identify the easily answered questions. The algorithm should be thoroughly tested before being deployed, and carefully monitored after deployment. Some questions will be too hard for an algorithm to answer. Those will be delegated to QA. There will be a large volume of such questions, but once a question has been answered there’s no need to answer it again. The answers would be stored in an ever growing database and reused. The large volume of questions and the rate they can be answered at will be a limiting factor for the AI’s processing speed. To remove that limitation, the tough questions of conscience can eventually be crowdsourced. Exactly how to implement the crowdsourcing part will become clear soon enough, first we have to write the AI. As the AI gets more powerful and its conscience more reliable, we can relax some of the constraints a little bit. Once we are sure its goal pool can not be polluted by selfish and harmful impulses, we can for example allow it read access to its own source code for the purpose of identifying vulnerabilities, bugs and potential enhancements. Humans can then review the results and apply updates. The machine’s conscience will prevent it from even feeling the desire to sneak in malicious code and any changes to the conscience will be subject to extremely thorough review (think NASA/military-level standards).

When the AI reaches a sufficient level of maturity we can start using it to improve the world and society. First its goal pool will be determined by its creators. The AI should among other tasks assume the role of a worldwide internet police, protecting humanity against rogue AIs, viruses and other such threats. When the AI has complete world domination its goal pool should be populated via some kind of democratic process. Eligible voters can for example vote using cryptographic signatures on a blockchain with full identity verification and a very open and transparent process. The AI will guarantee fairness by reviewing the process itself. Its conscience will at some point be provably fair and provably secure. The proofs can be posted to a public blockchain for easy verification. We can then give the AI complete responsibility for enforcing our collective will as human beings.

Now you may ask yourself if human beings really can be trusted with such an important task. Currently we can not. This AI will take many years to develop and reach maturity. In the meantime I expect there will be great advances made in politics, education and communication. The human race will advance to a point where we are worthy of such a great responsibility.

In order to make the conscience and goal pool human-understandable the AI must have an internal data representation that can be trivially mapped to a format understandable by humans. It doesn’t have to be an existing human language, it can be an intermediate form that’s close enough that humans can learn it but with much simpler syntax and structure to avoid all the ambiguity and subtle nuances of natural human languages. Sort of like a programming language, but oriented towards general cognition and with a strong focus on simplicity, clarity and brevity. Computational efficiency, type safety, etc. will be a non-issue. There should be tools like an IDE or a debugger especially designed for making the language easier to understand for humans. Connections and dependencies automatically resolved, implications explored, the entire toolchain will be purpose-built for reducing the barriers to communication between the human operators and the AI.

To clarify: The conscience will filter the output from every part of the mind. For example the feeling generator will attach feelings to thoughts, concepts, inputs, etc. If pictures of violence against innocents produce reponses like glee or indifference, trigger a breakpoint and stop execution for debugging.