Mission Failure

How spies and Elixir handle the unexpected

Meryl Dakin
Flatiron Labs
4 min readJul 17, 2019

--

The wii made a golden eye game I guess? Here’s a 007 mission failure!

In From Ruby with Love, we visualized processes in Elixir as individual spies completing missions. We ended with an open question: what happens when a process fails? Thankfully, we can again find our analog in the spy world.

First, let’s look at two more aspects of Erlang and Elixir: Linking and Monitoring.

Linking

Let’s return to our spies. Say they’re working on something super top secret, a formula that they each have partial ingredients for.

Spies with partial ingredients to a secret formula.

It’s too dangerous to give them both the whole formula, so we expect them to each make their part and deliver it back to us. However, if only one of them completes their part of the formula, it’s no good; we need both of them for it to be valuable.

So we “link” these processes together:

Linking the spies’ fates to each other.

This creates a special relationship between them. If one were to die, it emits a message to the other to die as well. Their fates are linked.

When one of the linked spies dies, it sends a message to his linked partner to die as well. (Ignore the changes to the legend, the script supervisor was off that day.)

We can see why it would be important in our code: if a process that others rely on fails, we’d rather wipe the slate clean and kill the rest instead of getting a botched result. As Elixir developers, we’re pretty much never going to be linking processes at this level — it’s handled for us in the abstractions provided by the language — but it’s important to understand that this is what’s happening for us behind the scenes.

Monitoring

The next relationship between processes to understand is monitoring. Like linking, it establishes a special relationship between processes. However, whereas the linked processes’ lives depended upon each other, monitoring is less mutual.

Let’s imagine that instead of two worker spies, one of those processes is more of an Agent Handler. (Yes, according to my SpyScape profile, this is a real spy job.)

Agent Handler monitoring her spy.

Our Agent Handler monitors her spy for unusual behavior. If he fails and dies during his mission, she won’t die — she’ll just know that it happened and can then decide how to recover the mission.

Spy dies, Agent Handler is informed. (Script supervisor has been fired.)

Failure

So our process dies unexpectedly. How do we guard against that? There really are so many ways something can go wrong.

Let’s say our spy’s formulas expire and release toxic gas:

Spy killed by toxic gas.

Or maybe enemies invade his secret hideout and kill him:

Spy killed by enemies.

Or maybe an extraterrestrial bug breaks into the lab and wreaks havoc:

Bug from another planet kills spy.

How can we prevent against failure like this?

This presents a core of Erlang/Elixir theory and leads us to a computer scientist named Jim Gray. He studied bugs in production and found that close to 100% of them were what he called “transient” bugs: those that are difficult or impossible to replicate in development. So while we can’t predict them, we can defensively program against them.

What Jim Gray found is that restarting the process was remarkably effective in getting rid of these transient bugs. Joe Armstrong saw this, too, and decided that instead of trying to bolster the program against any possibility (which would be impossible), the best course of action would be to simply allow the process to fail and start it over again.

Next Up: Abstracting Processes

We’ve seen that spies and processes handle failure much the same way: let it die, and the monitoring process can decide how to proceed. But as Elixir developers, we’re not going to be setting up links and monitors for process failure at such a granular level. Next we’ll talk about how to train these spies to do these basic things so we can focus on the higher-level application logic.

The above is an excerpt from a recent conference talk I gave at EMPEX NYC called Process Potential. You can see the full talk here.

Thanks for reading! Want to work on a mission-driven team that loves international espionage and Elixir? We’re hiring!

To learn more about Flatiron School, visit the website, follow us on Facebook and Twitter, and visit us at upcoming events near you.

Flatiron School is a proud member of the WeWork family. Check out our sister technology blogs WeWork Technology and Making Meetup.

--

--

Meryl Dakin
Flatiron Labs

Dev at Knock.app. @meryldakin on github, LinkedIn, and twitter.