Move code around without downtime

How Elixir makes failovers incredibly easy

joshnuss
2 min readJun 28, 2018

In a far off remote galaxy,

The computers on your brand new spaceship are struggling and have begun overheating. Unfortunately, the nearest planet is hostile to your species — so you cannot stop.

You need to move your critical systems to an external computer or you will crash and burn.

Well it’s your lucky day — the engineers that built your spaceship were using the BEAM (#yay), that means you can easily offload all those heavy computations to your home planet, reducing the load on your computer system and living to see another day,

You can failover with zero downtime!

Cloning state

It turns out, moving running code around with the BEAM is super easy because code & state are completely separated.

All you need is 2 nodes running the same code. Then, when you’re ready to failover, send the state from the primary node to the secondary node.

Yes, it’s really that simple.

Here’s an example:

Spaceship cloning example

Routing

The next problem is how will a client figure out who is the primary?

There are several ways to do this:

  • Rely on erlang’s global name registry: Register named process with Process.register(pid, {:global, name}), and update it when the failover happens Process.unregister({:global, name}).
  • Rely on some other registry like Elixir’s Registry or erlang’s gproc
  • Notify clients about failover using events (with gen_event or similar)
  • Have the client detect that it’s calling a stale server and redirect calls to the others, ie. server returns messages: {:ok, result} {:error, reason} or {:redirect, other_server_pid}

Transfer time

When a process has large state, for example a large array, it can take some time to transfer the state across the network.

Instead of doing it in one step, you can start syncing gradually, send part of the array at a time using a Task. When the syncing is done, you’re ready to failover instantly.

Summary

The BEAM’s distributed virtual machine makes it super easy to move state across machines, nodes, servers, data centers and even planets (as I’ve shown in the example 😜).

Unlike other VMs where the execution state is not serializable, the BEAM completely isolates code from execution state, allowing the state to be serialized and copied across the network.

--

--