The Paradox of Back-end and Front-end Reliability

How Let-It-Fail Makes Never-Fail Possible

Simon Janes
The Accepted Forest
4 min readAug 7, 2017

--

Far from your home, the back-end systems experience another normal day.

It’s a safety issue when you think about it. Users want to feel safe. If users ever feel vulnerable or angry — they will leave you. End of story. What makes a user feel vulnerable or angry?

Lagging and Overwhelming User Interfaces

Click. Click. Click-click-click-click.

[Wait for it…]

Click?

HEREYOUAREEVERYTHINGYOUASKEDFORFORTHELASTSEVENCLICKS

[Argh!]

This is not an interface, this is a bulk-foods-dispenser with the wrong chute diameter, the kind that gives you six-pounds of sushi rice — when you open it for 500 milliseconds — when you just wanted one-pound!

Vague Error Messages with “Try Again Later” Placations

“Something went wrong” is not an error message, but I understand that some users will not want more details — but if they do, maybe they should be able to get that information. DNS wasn’t working? Teach them how to diagnose their own local environment and their confidence will skyrocket. Just teaching them “Turning it off and on again” just creates more digital co-dependent users.

Disappearing and Reappearing Data

[Finish the conclusion to your two-year thesis on the decay of optical media managed by howler monkeys… save… refresh page]

FILE NOT FOUND… OH HAI THAR IT IS LOADIN’

Data that disappears and reappears is not a magic trick users appreciate.

Failure Modes are for the Administrators of the Ivory Towers

Compare the utility and engineering that goes into a “consumer product” versus an “industrial product.” A consumer product is sold to ordinary people, who just want their problems to be solved. Bad products create more problems for people. The scary thing about “virtual goods” like websites, is that it is very easy to “ship” bad experiences to hundreds, thousands or millions of people.

The only people that should have to worry about failures are the people being paid to worry about them — and your customers aren’t those people. Erlang/OTP is a platform that fully embraces the “let it fail” idea on the back end with process-oriented programming.

Process-oriented Programming or Actor Model Programming

Sir Charles Antony Richard Hoare described this model in a programming techniques paper called “Communicating Sequential Processes” (1978) In this paper, a “fundamental program structuring method” involving processes that cooperate via message-passing only (no global or shared memory). It would take some time — as new understanding in prior decades always takes longer to propagate without the Internet — before others would start to design systems around this idea. Today we have languages like Erlang, Go and Pony that embrace CSP as a fundamental model of program structure. If your programming environment cannot cope with millions of processes, get a new programming environment.

On top of Erlang, OTP was built that created “supervision tree” structures that wrangle these fundamental processes. If a process goes awry, “let it fail” and make a new one — since everything is virtual, it isn’t like a machine on a factory floor throwing hot slag at operators — and log the error for later analysis. The ability of OTP to supervise and automate the recovery of these systems makes it easier for administrators and developers to react to problems without “the Website is Down!!!” alerts. If your pager never goes off, that’s a good day. Systems built with structures like OTP have phenomenal strings of good days. Some report that systems built with Erlang have nine-9’s of reliability — e.g. 99.9999999% reliable. Imagine… one second of downtime in 20 years. That’s how insane this is.

Now you could also just embrace CSP inside the web-browser or client, but it appears that we know how to make client code bullet-proof so any client avoids run-time errors.

Functional Reactive Programming and Monads

Which is more scary? FRP or the big “M” word? Probably both. Functional Reactive Programming is a liberating idea that removes the worst parts of client-user-interfaces: inconsistent view updates and controls. Monads are the big-hairy-mathematical theory that essentially restricts what a program does in order to simplify how we think about it: if your functions have side effects and “aren’t pure” then your programs will be much, much harder to test.

Functional Reactive Programming systems like Elm just systematically remove all invalid states and turn your user’s gestures into messages to be “reacted” to be different aspects of the client. In short, if you have been programming “event handlers” to subscribe to mouse-clicks and key-presses without any intermediate application-specific representation, you are missing out on some ability to debug your program. Just as with working with CSP, FRP requires some practice and discipline to avoid reaching out for shared-states aka “global” variables. The “Monad” comes into play when you need to interface with external systems that may fail — like your back-end system or network between you and the back-end system. If you are ready to accept either a “result” or an “error” in your client (and you should always be checking for these results or errors in your client) then your client will never present a mysterious behavior to the user, in other words, a vague “something went wrong” catch-all error.

Letting it Fail Faster in Order to Never-Fail

You should be able to plan a contingency for your users so they can have the best experience — never fail the user if possible — and on the back end, use automation and supervision to recover from failure quickly — because bad states on your servers should be thrown away as fast as possible to recover the memory for the work that gets things done.

--

--