How spies and Elixir handle the unexpected
In From Ruby with Love, we visualized processes in Elixir as individual spies completing missions. We ended with an open question: what happens when a process fails? Thankfully, we can again find our analog in the spy world.
First, let’s look at two more aspects of Erlang and Elixir: Linking and Monitoring.
Let’s return to our spies. Say they’re working on something super top secret, a formula that they each have partial ingredients for.
It’s too dangerous to give them both the whole formula, so we expect them to each make their part and deliver it back to us. However, if only one of them completes their part of the formula, it’s no good; we need both of them for it to be valuable.
So we “link” these processes together:
This creates a special relationship between them. If one were to die, it emits a message to the other to die as well. Their fates are linked.
We can see why it would be important in our code: if a process that others rely on fails, we’d rather wipe the slate clean and kill the rest instead of getting a botched result. As Elixir developers, we’re pretty much never going to be linking processes at this level — it’s handled for us in the abstractions provided by the language — but it’s important to understand that this is what’s happening for us behind the scenes.
The next relationship between processes to understand is monitoring. Like linking, it establishes a special relationship between processes. However, whereas the linked processes’ lives depended upon each other, monitoring is less mutual.
Let’s imagine that instead of two worker spies, one of those processes is more of an Agent Handler. (Yes, according to my SpyScape profile, this is a real spy job.)
Our Agent Handler monitors her spy for unusual behavior. If he fails and dies during his mission, she won’t die — she’ll just know that it happened and can then decide how to recover the mission.
So our process dies unexpectedly. How do we guard against that? There really are so many ways something can go wrong.
Let’s say our spy’s formulas expire and release toxic gas:
Or maybe enemies invade his secret hideout and kill him:
Or maybe an extraterrestrial bug breaks into the lab and wreaks havoc:
How can we prevent against failure like this?
This presents a core of Erlang/Elixir theory and leads us to a computer scientist named Jim Gray. He studied bugs in production and found that close to 100% of them were what he called “transient” bugs: those that are difficult or impossible to replicate in development. So while we can’t predict them, we can defensively program against them.
What Jim Gray found is that restarting the process was remarkably effective in getting rid of these transient bugs. Joe Armstrong saw this, too, and decided that instead of trying to bolster the program against any possibility (which would be impossible), the best course of action would be to simply allow the process to fail and start it over again.
Next Up: Abstracting Processes
We’ve seen that spies and processes handle failure much the same way: let it die, and the monitoring process can decide how to proceed. But as Elixir developers, we’re not going to be setting up links and monitors for process failure at such a granular level. Next we’ll talk about how to train these spies to do these basic things so we can focus on the higher-level application logic.
Thanks for reading! Want to work on a mission-driven team that loves international espionage and Elixir? We’re hiring!