I have trust issues with automation.

Adam Goossens
Jul 8, 2019 · 4 min read
Image for post
Image for post
Photo by Bernard Hermant on Unsplash

Empire builders — not only the Romans

Remember the empire builders of days past? You know the types — they’d have legions of shell scripts that were complex and nobody knew what those scripts did, except them (naturally). They skimped on documentation — after all, they knew what their work did.

YAML does not make for understanding

We’re replacing our custom shell scripts with custom automation in Ansible, Chef, Puppet, etc, but it’s a mistake to think that because these tools are easier to read, that the automation they perform is easier to understand.

Automation for the people!…that come after you

When writing automation, remember: you’re not just doing it for you. You’re doing it for the people who come after you.

  1. Never automate something you don’t understand. If you do not understand the manual process that you are automating, learn it until you do.
  2. Watch the scope creep. Try to avoid doing multiple things at once with a piece of automation. Where possible, follow the Unix philosophy — do one thing, do it well. Document interdependencies, variables, tags, toggles, etc, thoroughly.
  3. Understand that not everything needs to be automated. Every piece of automation you keep around, you need to maintain. Otherwise, what was the point in keeping it? Consider the cost-benefit relationship before you crack open the editor to create yet another playbook that will end up in a repository never to be run again. If you write it as a once-off to save you time — decide if you need to keep it at all. If you’ll never use it again, throw it away.
  4. Take responsibility. When you write a piece of automation, take responsibility for it. That includes keeping it and its documentation up to date. Don’t toss it into a repository as a ‘commit and forget’.
  5. Write your automation defensively. I am a big fan of the Ceph playbook that handles cluster updates (an oddly specific example, but bear with me). It operates over the cluster serially. It checks the cluster health before and after every host. It refuses to continue if the health isn’t appropriate. If a host fails, the blast radius of that failure is limited to the failing host. It’s defensive at every step, and consequently, it works very well.
  6. Check for preconditions before you start your automation. Verify system health as you proceed. Never make any assumptions about state — check it before continuing. Fail early and fast if something isn’t right.

Automation: play the long game

Our modern automation tooling enables us to manage our systems at scale. The simplicity of just running a piece of automation can lull us into a false sense of security, a false sense of understanding.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store