Aerys Targaryen II, King of the Andals and the First Men, is not a leading figure in DevOps circles, but he should be. Sure, he was psychopath who enjoyed burning people, but he also grasped a few fundamental truths of automation. No half measures, burn everything and embrace chaos. If you wanna get resilience, you’re gonna need to let your inner pyromaniac out.
So what are we burning?
Everything. Network links? Burn ’em. Servers? Toast ’em. Pictures of your ex-girlfriend while you’re on your third bottle of wine? Cook ’em. Anyway, off topic.
To achieve resilience, we need confidence that our automation actually works. We all remember the days before failover testing. The armies of business continuity consultants, wielding impact analysis documents and invoices. The clown car was full of nefarious leeches. We don’t need ’em. What we need is confidence in our automation.
As soon as we automate, the rusting begins. What starts out shiny becomes dim. Once we’ve automated, how do we know our automation works? It’s a common situation. We haven’t ran it in months. The lady who wrote it all has left. None of us know how it works, we just pray it will when the time comes.
I’ve been there, a few times. So… I’ve accumulated a list of practices that will keep your automation crispy and your enemies burnt to a cind- I mean… bake resilience into your applications. Let’s get started.
Automate everything you can. No half measures.
There’s a glorious story of an engineer who automated his entire life. When he swiped his pass, a 17 second countdown would begin. When that countdown was finished, the coffee machine would start pouring him a mid-sized half-caf latte. That is what the extreme looks like.
So when I say “automate your infrastructure, deployments, configuration or patching”, know that I’m well within the bounds of normality. This isn’t idealistic rambling. It’s the reality. As soon as you infect your automation with manual steps, the rot will set in. The lazy termites will appear. Before long, your once flawless automation will be awash with cancerous compromise.
In software, everything you do is potentially a pattern. Automated(ish) is tempting for engineers. It says “just leave the hard stuff”. Meanwhile the termites are chewing away at the foundations of your glorious automatons.
Well, maybe a few half measures.
I know what you’re about to say. It makes sense to do the easy stuff and do the hard stuff later. I agree. So when is later? Is it after you’ve had an outage and your automation no longer works? Or what about at 3am and you’re trying to deploy your application?
Delay the hard stuff if it makes sense for the business, of course, but don’t delay indefinitely. It will bite you. As soon as your back is turned, those termite bastards are going to turn three manual steps into thirty. They’ll do this with a clear conscience too. You set the pattern, they followed it.
Burn them all.
The Mad King was an equal opportunity arsonist. A northerner from Winterfell? Fetch the wildfire. The population of King’s landing? Let’s get the BBQ going. He didn’t care who got toasted, he just wanted to make a little crackling. Be like the Mad King. I think that contains everything you need to know.
Wait no, don’t set people on fire.
Douse your servers in digital petrol and flick a match. Remove network routes, blackhole traffic, inject faults into your APIs. Be brave. Sooner or later, this stuff is going to happen and you need to be ready. You could trust some guy who tells you to buy Oracle’s latest and greatest monstrosity, or you can trust the evidence.
The Simian Army is your best friend here but if you don’t want to go all out, just sign into the AWS console and start terminating servers. Start in your non-production environments. Record your expected behaviour. Record the actual behaviour. Fix. Rinse and repeat. You don’t know unless you try. Never trust the label on the back of the packaging.
Tell me a man with these fingernails hasn’t embraced the chaos:
Aerys understood a fundamental truth. That order is an illusion. His work was then developed a few years later, by a man who made most of his money by smirking in corners. You’ve got it. It’s Petyr Baelish, the artist formerly known as Littlefinger. Remember his creepy ass speech? “Chaos is a laddah”. Well it turns out you can be creepy and correct.
Things are going to blow up. It’s going to happen. Just come to terms with it. You can’t fight fate. You know what they say. If you can’t beat ’em, join em. So grow out those fingernails, develop delusional fantasies of being a dragon and dial up the thermostat.
See, bet you didn’t think this would make any sense, did you?
Neither did I, to tell you the truth. I wrote it and now I don’t know if I can look at myself in the mirror. But one thing is for sure. If the Mad King was alive today, he’d be an outstanding site reliability engineer.
I’m regularly burning my enemi- talking about DevOps on my twitter.