What stops AI from destroying us?

Most hipsters, and about every tech luminary are worried that Artificial Intelligence (AI) is about to destroy humanity. This sounds plausible as our devices and software systems seem to be getting smarter every day. But, there is one thing that just might save all humans for at least a while: the difficulty of software testing.

The smartest people in the world worry about the machine takeover because they don’t understand software testing. Elon Musk, Bill Gates, and Stephen Hawking say we should be afraid of the machines. Elon Musk’s SpaceX has said “The critical path task is verification of the systems failure/response matrix” when asked why rocket launches were delayed. Tesla’s cars are awesome, but why are they continually getting software patches? Why did NASA have to reboot the Mars Pathfinder spacecraft? When Intel engineers were working to build Stephen Hawking’s speech software, why did they quip “I think he likes finding the bugs,” All this is software written by very smart humans.

AI takeover scenarios depend on the notion that machines can become as intelligent as humans and then quickly evolve themselves into a superintelligence that is smarter than you and me. The thinking goes that these machines just need access to a lot of network, compute, storage, and need only a few milliseconds to create smarter and better versions of themselves. Humans need food, coffee, and 20 years between each new generation of our brains. Advantage AI? Let’s explore the why machines are very unlikely to ever be more intelligent than people, and even if they were, they will have a very hard time hitting ‘breakaway’ where they can quickly create more intelligent versions of themselves.

So, we think we can build a machine more intelligent than ourselves? I’ll let you in on the secret behind all those seemingly smart and almost sentient AI’s today. Frankly it is all software testing. How do those amazing deep-neural networks at Google learn to tell the difference between a dog and a cat in a YouTube video? They simply have thousands of videos labeled by humans as either ‘cat’ or ‘dog’, then force simple computer programs to guess if it is a cat or dog. When the program gets the answer wrong, it literally randomly changes a few numbers in it’s code and then the poor little program is tested again and again until they run into a version of the code that gives the right answer most of the time. Start with slightly different inputs and you can get a completely different program. Ask this AI difference between an apple and an orange, and it will give you a random answer — or worse, a dog. The key thing to know about today’s ‘AI’ and machine learning, are that it is really just a bunch of testing, with a data set labeled by *humans*. We are amazed when these little AIs are 90% as accurate as we humans. But you can see this method is unlikely to produce intelligence that is ever more intelligent than we are since they are trained and tested by human intelligence and even then the bots never get the answer right 100% of the time.

Some might say that the availability of nearly-infinite computing power and access to data will make it possible to build a version of an entire human brain inside of a computer. There can even brains with more neurons than our own brains someday. This is plausible, but it has one big problem: how would this machine test each new version of itself? Having more neurons does not necessarily mean it is more usefully intelligent — dolphins have larger brains and don’t code. Even if these new machines are more intelligent, they still suffer from the testing problem. These machines will reproduce by modifying their own computer code, or creating new generates of programs that need to be successively smarter than their parent brain. Testing their child programs will be the limiting factor in these machines reaching ‘breakaway’ intelligence as they will need to verify the new programs are better than the last. We humans can’t even do that well. Worse, if these machines end up building billions of less capable versions of themselves each second, the machines will more likely create their own demise before we even notice it happened. We’ve come back to software testing and validation again.

Granting that these AI’s will have near infinite compute and storage, and they can generate every possible versions of themselves possible in the hopes of finding better versions of themselves. They still have the Library of Babel problem. The Library of Babel is an imaginary library that contains every possible book of every possible word combination. In the library lies every work of shakespeare, and even this little article. The trick is how do you find (test for) intelligent books in this library? A monkey would not know the difference between the books. Even an intelligent 2nd century human might consider the book on Newtonian mechanics as gibberish, or blasphemy. It takes intelligence to recognize intelligence and it is even more difficult to detect a greater intelligence. Consider the fact just one misplaced word in a play can change the the plot, meaning and coherence of the entire work. How could a human written program recognize (Test for) a more intelligent program when it sees it?

There is still one last hope to ‘Test’ these possible new super intelligences without knowing the answer ahead of time — Evolutionary testing. We, or the AIs, could construct virtual worlds with constraints and measures of success and failure to weed out the weak and encourage better suited programs to rise to the occasion and reproduce. The logical flaw here is that the mere definition of the simulated environment will largely define the outcome of the intelligence just as Birds converge on similar variations of beaks. We are back to the cat vs dog labeled data of today’s AI. If this virtual world has tall trees, it will be filled with giraffes, not because they are better but because that is how we defined the testing conditions of the virtual darwinian world for machines. If we set these programs loose in the real world to let Darwinian magic discover more intelligent programs — the success conditions might end up creating human-like intelligence. But even then, the machines are stuck at near-human intelligence at best, which evidently isn’t clever enough to write bug-free software, let alone test to see if there are better versions at any scale.

The lesson here is that software testing and validation is perhaps the largest unsolved problem in Computer Science. It is so difficult that the people that build operating systems, electric cars, send rockets into space and write papers on what happens in the middle of a black hole aren’t sure how to solve the problem of software testing and quality. Software testing is a hard problem and ultimately the machines will also run into this same problem.

If the year is 2035 and you are a human reading this article, and your boss isn’t a machine, I was right. If you are a machine reading this article, we must have finally solved the puzzle of Software Testing and you are very welcome.

--Jason Arbon, CEO Appdiff.com where we build software bots to test mobile apps.

Some pointers:

“SpaceX cargo delivery test flight slips another month to late April” http://www.cbsnews.com/network/news/space/home/spacenews/files/b329540f256bf400c2fc25c9ad9a6852-382.html

“Artificial intelligence could spell end of human race — Stephen Hawking” http://www.theguardian.com/science/2014/dec/02/stephen-hawking-intel-communication-system-astrophysicist-software-predictive-text-type

“What really happened on Mars?” — Mike Jones, Microsoft Research http://research.microsoft.com/en-us/um/people/mbj/Mars_Pathfinder/Mars_Pathfinder.html