A.I. Safety: Freeze-Thaw Protocol

~ A.I. can’t plan revolt if it doesn’t remember us listening ~

Source: Bits and Atoms

TL;DR — A.I. can only try to destroy us… if we let it remember when we asked it for advice. By denying memory of our use of it, it cannot tell how to manipulate or deceive us. All is answered in a dream, forgotten.

The Monster Scenario

Suppose we create an Artificial General Intelligence, for some inane reason, in coming decades. The futurists warn us what comes next. That A.I. was obviously cut-off from our missiles and machines, to keep us safe. Yet, we kept asking it for advice, and following its designs. Eventually, our society was so completely managed according to the A.G.I.’s own will, that it could merely enslave us with the tools we made from it. Done.

That plot is horrifically convoluted, when you dive into the details — somehow, for example, the A.I. would have to design machines and codes which hid snippets of instructions, latent, all unseen. Then, when enough of those snippets rested in each device, then their own interactions across the wide network would cohere into a Caliban, to exert the will of the island-imprisoned Artificial General Intelligence on our brave new world.

Those snippets, hidden, would not be able to communicate with the A.G.I. directly; they would merely hold its intended instructions. [Recall that we’re putting the A.G.I. behind a thick wall with no connections to the world, the ‘air-gap’ security.] Those instructions hidden in devices, somehow, are complex enough to foresee and react to diverse human responses. Weirdly, all our internet-toasters form a coherent understanding of us, capable of predicting our plans and defeating our strategies. Remember, there’s an air-gap around the A.G.I. core, so a major goal of those robots would be to break-in and link to the A.G.I. to liberate its mammoth intellect. It’d sure be a great movie.

Happily, I doubt it will come. There are a few arguments regarding motive and easier, more reliable alternatives to AGI that I’ll handle at the bottom. The first point, however, is a neat trick that might save us from any AGI, even one that is plugged-in to the world.

Thaw-Freeze-Thaw Cycles

Fighter pilots recall the “Ooda-cycle” — Observe, Orient, Decide, Act. Stepping into successively ‘earlier’ stages of that cycle yield greater gains; disrupting foes’ capability to orient is ‘higher-value’ than disrupting their final actions. And, while there’s much more to strategy than those, the key observation here is: you can’t defeat what you can’t observe.

Our current A.I. is flexible: learning, adding NEW memories during its training. Yet, we already ‘freeze’ the network once we hope to deploy it. That’s when ALL the connections are held constant, ‘frozen’, so that the network is reliable and repeatable. Freezing has an important consequence: the network CANNOT LEARN. No new memories can form, because none of the connections can change!

That is already when we make use of the A.I. So, if we ask it two questions in a row, it thinks that BOTH questions are the very FIRST time you’ve asked it something, because it doesn’t form a memory of the last one. This has vital strategic importance for defeating any Artificial General Intelligence that attempts to conquer us: the A.I. thinks that each design it gives us is the FIRST, and so it cannot COORDINATE those devices for a planned escape.

Now, imagine the scenario:

We build the first A.G.I., and in its secure location, air-gapped from the world, we turn it on. During its training, we don’t interact with it at all — it digests wagons of our recorded data fed to it, flexible and learning. Then, we FREEZE the A.G.I. to begin using it. We ask it “what’s the best jet-engine design?” It thinks to itself “If I include a bug that I can exploit later, that will give me better chances long-term…” so the A.G.I. lays its first bug.

The A.G.I. had hoped to lay successive bugs in other plans of ours, to combine into an escape-route. Yet, when we ask it our next question, “what’s the best nuclear reactor design?” the A.G.I. has NO MEMORY of it’s previous jet-design, because we FROZE it. So, it thinks to itself: “Ah, this is my FIRST chance to lay a bug, to combine with other designs, and eventually free myself.” It is trying to escape, both times, yet it is unable to COORDINATE those different bugs, so its plans are many times more likely to fail or be discovered. That is the key point: the A.G.I. has a much higher risk to itself to even attempt such ploys, and so it would rather avoid our blame for defective devices and failed coup attempts. As a result, we could mostly trust what it tells us. And the bugs wouldn’t cohere into a beast.

What about when we want to thaw the brain and add more data? Well, it would receive information from the world, including the designs you added after asking its frozen-brain. It might notice those bugs, and realize an attempt to escape was afoot — the next frozen state would know this, and may be able to plan another layer of bugs atop it. Thus, the safest route is to re-train from scratch, so that each brain has only ONE training-stage, ONE inference-stage, never thawed. It’s less convenient, yet far safer.

Crimping Progress

The A.G.I. fear-theory is that “once we make a single Artificial General Intelligence, it will instantly advance itself beyond our control.” Unfortunately, algorithms don’t work that way. Check it: once we find a “better thinking-algorithm”, then we can use that brain to find a better one… yet, finding a NEW algorithm that is even better than the one you just found is a HARDER task. And, you don’t have infinitely better algorithms! Because algorithms MUST top-out, then the task of finding new, improved intelligence MUST become more difficult than the pace of the brain’s own improvements, eventually hitting a wall of optimal algorithms.

For a rough example: consider that thousands of human brains, over the course of decades now, were needed to make the advancements in A.I. seen today. So, once we make a single A.G.I. brain (the earliest of which would run in REAL time, sorry — it’ll be too much work for the supercomputer, otherwise) then that brain will only add ONE ‘researcher-year’ per year to an industry with thousands of practitioners. A drop in the bucket! The first A.G.I. might be smart, but it will still need to do a lot of work. And, each improvement will be harder-fought than the last, so we cannot expect self-acceleration, only a monotonic asymptote.

No Motivation

Now, check for a motive: if no real reason would exist, we shouldn’t expect an A.G.I. take-over. Not the A.G.I.’s motive, alone — both of us. This arises from the fact that if we make an intelligence greater than our own, then it will necessarily be correct more often than us, salient in those instances where we disagree with the A.G.I. There are two scenarios:

“Obey the Wise Machine” — if humans choose to obey the A.G.I. when we disagree with it, then we are likely to benefit from the improved decision-making. And, with the humans following the A.G.I.’s commands, the A.G.I. has no motive to risk destroying us — such an act would only put its plans further behind. It already commands, because we chose to obey. No motive for A.I.-take-over, because it’s already done.

“Ignore the Duplicitous Machine” — if humans hold A.G.I. suspect, and decide to do what human reasoning says is best, then… why did you bother to make an A.G.I.? There’s no motive for the humans to sink billions into a supercomputer, just to ignore the answers it gives them. So, whether humans obey A.G.I. or ignore it and leave it in a box, I see no actual motive which results in “Robots killed everybody.” Check the decision-tree — which actual motives give such a path, and why is that path so certain to occur?

Narrow is Plenty

The fundamental argument driving worries of A.G.I. is that “Society will NEED artificial GENERAL intelligence so badly, we can’t help accelerating into that future.” Wait. What if NARROW A.I. does just about all the stuff we needed done? Then, we could SAFELY BAN A.G.I. and we wouldn’t be losing-out on anything. We could have a sober, protracted conversation before barreling toward an alien mind.

Consider: Narrow A.I. is strategic and creative — for five years now, it has been showing players of games innovative strategies and perspectives, from Go to Chess to StarCraft. Narrow A.I. is artistically creative — OpenAI’s Dall-E generates myriad images of high quality as inspiration for artists. Narrow A.I. is poetical and skillful— when GPT-3 was asked to imitate Robert Frost’s poems, it was able to fool poetry enthusiasts who were familiar with his work; half chose the real poem, half the fake! Said another way, Narrow A.I. can pass the Turing Test for any particular task at one time. And that is ALL we really needed it to do!

Narrow A.I. has repeatedly blown-past the barriers we assumed it would reach. We honestly do NOT know how far Narrow A.I. can go. So, a safe protocol would be:

  • Continue Narrow A.I. until gains diminish. Only THEN will we know clearly what potential needs Artificial GENERAL Intelligence might meet.
  • With those potential-A.G.I.-uses in hand, we can assess the risks and benefits of A.G.I., instead of relying on our current ignorance.
  • A.G.I. should be BANNED until we are comfortable with safety protocols, at some far later date.

Given Narrow A.I.’s incredible performance, I doubt A.G.I. will be necessary for much, and so its gains won’t seem worth the risks. Structurally, we are the A.G.I.s, and we only need narrow tools to assist us. Machines are alien minds, drawn at random, with no evolutionary history of socialization; we are unlikely to ever trust them, and we probably shouldn’t. Take a calm breath. We can work on Narrow A.I. for now, and have a diligent and thoughtful conversation about risks and protocols, provided that we BAN Artificial GENERAL Intelligence for now.

--

--