Fail-Safe AGI. Why worry? Let’s do it right the first time.

6 min readAug 9, 2016

Unless we design our AGI to be fundamentally fail-safe from the start, we put ourselves at risk of a horrible, unrecoverable error.

The road to any advanced technology is littered with burned out husks of epic failures. Elevators, for example, had many accidents before fail-safe designs became commonplace; that’s the price of progress.

Yet the quest for AGI seems a bit more worrisome than usual. While our ideal AGI will be perfectly benign, will align with human interests and values, and can be kept under control, what about each earlier step along the way? Each version prior to the ideal is flawed in some significant way.

By the time we integrate this hyper-advanced technology into our lives and spread its influence around the world, thinking “this is the one!”, the consequence of its flaws will be too late to avoid. Without a fail-safe design, what we believe is our pinnacle of intellectual achievement will seem wonderful and perfect until the day it is not.

Philosophers like Nick Bostrom, who triggered warnings of AI dangers from Elon Musk and Bill Gates, are already pondering the problem of AGI safety. In his recent book Superintelligence, Bostrom did an excellent job of covering many areas of concern and yet never touched on the idea of fail-safe. The survey paper Responses to Catastrophic AGI Risk also does not discuss the idea. Maybe fail-safe is such a bad idea that it isn’t even worth dismissing. You tell me.

So it’s wonderful that extremely capable people are thinking about the problem, but most of all we must avoid falling into the trap of layering safety mechanisms over top of lab-grown approaches that are fundamentally fail-deadly.

Fail-safe vs Fail-deadly

The difference between fail-safe and fail-deadly is like the difference between coevolved species and invasive species. By coevolved, I mean “evolved together”.

Coevolved species have had a very long time to establish a sustainable balance or equilibrium among them. Failures are common but largely isolated and quickly extinguished as evolutionary dead ends. With coevolution, the species slowly and incrementally develop into a richly interconnected system that involves fitness tests and mutual selection pressures all along the way.

A near-term example of coevolution is our domestication of cats and dogs. In general, we don’t fear either cats or dogs. Over the past few thousand years we’ve selected cats to be small enough that they can’t harm us and dogs to be obedient enough that they won’t harm us. While there will always be the occasional vicious dog or foul-tempered cat, we are not concerned that they will rise up against us because they are our trusted companions. When one of them shows signs to the contrary, we get rid of it. Other examples of coevolution (evolving together) are literally everywhere: it’s the ecological norm.

In contrast, invasive species have a habit of disrupting finely balanced equilibriums and causing cascading effects. Whether rats in Polynesia, zebra mussels in the Great Lakes, or the Burmese python in Florida, the result is often devastating to some part of the established ecosystem. The problem arises because of the bi-modal effect of transplanted species: in most cases there is a quick extinction, but on the rare occasion that doesn’t happen, the result is a rapid invasion and ecological domination.

Unfortunately, our current AGI designs, being largely of the lab-grown type, bring with them a risk profile much closer to the transplant or invasive end of the spectrum than the coevolutionary end.

I propose that the more coevolutionary and symbiotic an AGI design, the more inherently safe the result. When failure in a coevolutionary context does happen, and it will, the effects tend to be local and contained, becoming a dead-end of little note that frees resources for alternative branches.

What might fail-safe AGI look like?

The design of a coevolutionary AGI centers on growth and evolution alongside humans within society and is as widely distributed and transparent as possible.

This contrasts our tendency to design and test AGI-track projects in a lab-like setting accessible only to the academic and corporate elite, isolated from raw humanity for as long as possible. Even if a design is highly evolutionary and adaptive internally, if it does not have lengthy, low-level, and wide exposure to the human environment (and us to it) then it exposes us to the risk profile of an invasive species or a virus that jumps from one species to another. Concerns about AGI safety are more than justified as long as we insist on these types of designs.

A Widely-Spread Seed

An AGI that evolves alongside humans will look very different than the lab-grown kind. The ideal design will grow from a very small seed and have a sole objective of aligning itself with the vast variety of human interests. This ideal design will be widely and freely distributed, to the point of a one-to-one coupling with any human who cares to participate (and provide rewards to those who do). The implementation will also be straightforward and accessible enough that anyone can contribute to its ongoing evolutionary process.

Selection Pressures

Perhaps most importantly, the design will be founded upon a set of selection pressures that ensure meaningful growth and ongoing safety. One selection pressure is applied to the degree of intelligence this AGI expresses. Who better to judge an intelligence than a specific human whose interests are coupled to it? Another selection pressure is applied toward continued and tightened coupling and the associated mutual benefit between the AGI system and a human. And yet another selection pressure promotes global diversity of evolutionary branching. There can be as many pressures as we want.

Tight Coupling

If any core selection pressures stagnate or diminish, safety is assured because all instances of the AGI are at all times tightly coupled to a specific human and subject to their interests. Should the core selection pressures fail completely, destructive evolutionary branches are quickly detected and abandoned, becoming dead ends whose risks die with them.

Will we do it?

We certainly can do it but probably shouldn’t expect supporters of isolated lab-grown efforts to lead the charge. It’s up to outsiders to get the ball rolling. Still, we have reason to be optimistic of broad collaboration because a coevolutionary approach can be viewed as a philosophy, framework, or shell within which any of the lab-grown techniques can be applied, tested, and combined in novel ways, and themselves coevolved. This lets everyone contribute no matter their level of technical ability or their field of research.

Moreover, a preference for (or insistence upon) coevolutionary design puts us in good standing with the precautionary principle without putting the kibosh on the whole idea of AGI. Because a coevolutionary approach can exploit the great parallelism and energy of the combined digital and human realms — while having a potentially decade-long window to work within — it has the potential to produce safer, quicker, and even superior results to purely lab-grown designs.

Finally, I’d like to point out that I have a coevolutionary AGI design already in progress. No doubt others are doing similar things, but this is my contribution. My implementation is Benome, which I’m making as open and transparent as possible and for which I’d appreciate whatever help, support, or critical eye you can spare.