Runbook: A Ruby DSL for Gradual System Automation

Patrick Blesi
Aug 1, 2019 · 7 min read

At Braintree, we like to write tools to automate our work. Our latest tool is Runbook, a Ruby DSL for gradually automating system operations.

Image for post
Image for post

I know what you’re thinking: Why build yet another tool to automate an engineer’s job? We already have bash scripts!

First, anyone who has tried writing a for-loop in bash will admit it’s not intuitive (I have to look it up every time!). Second, even when scripting out solutions to common maintenance operations, there are often setup, teardown, and verification steps that are required to ensure the operation ran successfully. How many times have you run into issues forgetting to execute a setup or cleanup step that’s required for your maintenance script? How many times have you forgotten to verify that an operation has succeeded?

We can often mitigate these kinds of issues with good documentation. The problem with software documentation, as we know, is that it can become outdated over time if the maintainers neglect to update it.

How often have you scripted a maintenance operation only to have it become outdated and break six months later? Inevitably, you break out the editor and perform script surgery in an effort to recover from the failed state.

Runbook addresses these types of issues by providing a framework that tightly couples the documentation and code for an operation. It also allows you to progressively automate your operations, finding the right balance between full automation and human involvement.

The philosophy of Runbook is heavily aligned with Dan Slimmon’s Do-nothing scripting and Atul Gawande’s The Checklist Manifesto. It is designed to minimize Toil.

Runbook is not intended to replace more special-purpose automation solutions such as configuration management solutions (Puppet, Chef, Ansible, Salt), deployment solutions (Capistrano, Kubernetes, Docker Swarm), monitoring solutions (Nagios, Datadog), or local command execution (Rake tasks, Make). Instead Runbook is best used as a glue when needing to accomplish a task that cuts across these domains.

A simple runbook

A runbook outlines a list of steps required to perform an operation.

# restart_nginx.rb

It can be compiled and used to generate a Markdown checklist or be interactively executed.

# Restart Nginx
Image for post
Image for post

Adding automation

Moving past this initial outline, one can start to build automation into their runbook.

# restart_nginx.rb
Image for post
Image for post

Notice that this runbook includes the step confirm “Nginx is taking traffic?”. You can easily put off scripting steps that are more difficult to automate by delegating that step to the person executing the runbook.

Features

Some of Runbook’s features include:

SSH integration

Runbook integrates with SSH using SSHKit to provide support for executing commands on remote servers, downloading and uploading files, and capturing output from remotely executed commands. You can control the parallelization strategy for execution, executing in parallel, serially, or in groups.

Runbook.book “Restart Nginx” do
section “Restart Services” do
servers (0..50).map { |n| “app#{n.to_s.rjust(2, “0”)}.prod”
parallelization(strategy: :groups, limit: 5, wait: 2)

Runbook supports different parallelization strategies. The above example executes service nginx restart across app{01..50}.prod on five servers at a time, waiting 2 seconds between each execution.

Dynamic control flow

We designed Runbook’s control flow to be dynamic; at any point you can skip steps, jump to any step (even a previous step), or exit.

Image for post
Image for post

Runbook saves its state between each step of the runbook, and it can restart from where it left off if an error occurs while executing the runbook. In fact, you can resume a stopped runbook at any point in its execution.

Noop and auto modes

Runbook provides both a noop and an auto mode. Noop mode allows you to verify the operations your runbook will run before you execute it. Auto mode will execute your runbook, requiring no human interaction. Any prompts you have added to your runbook will use the provided default values, or the execution will immediately fail if prompts exist without defaults.

Image for post
Image for post
Runbooks can be executed in noop mode to describe what commands the runbook will execute

Execution lifecycle hooks

Runbook provides support for before, around, and after execution hooks. You can alter and augment your runbook behavior by hooking into the execution of entities and statements in your runbook. Hooks can be used to provide a rich set of behavior such as timing the execution of steps of a runbook or the runbook as a whole, tracking the frequency of execution of a runbook, and notifying Slack when a runbook has completed.

Runbook::Runs::SSHKit.register_hook(
:notify_slack_of_execution_time,
:around,
Runbook::Entities::Book
) do |object, metadata, block|
start = Time.now
block.call(object, metadata)
duration = Time.now — start
unless metadata[:noop]
message = “Runbook #{object.title}: took #{duration} seconds to execute!”
notify_slack(message)
end
end

First-class tmux support

Image for post
Image for post

At Braintree we live on a steady diet of vim and tmux. Consequently, Runbook provides first-class support for executing commands within a tmux. When specifying your runbook, you can define a tmux layout. This flexible and intuitive interface allows you to send commands to panes by name.

Executing commands in separate panes is ideal for monitoring, commands that require user interaction, or commands that are prone to failure. You can then interact with the command directly, troubleshooting and resolving issues before continuing the runbook.

Runbook.book “Restart Nginx” do
layout [[
[{name: :top_left, runbook_pane: true}, :top_right],
:middle,
{
name: :bottom,
directory: “/var/log”,
command: “tail -Fn 100 nginx.log”
},
]]

Runbooks remember their tmux layouts between executions. If a runbook stops unexpectedly, it will connect to the existing tmux layout when resumed as long as the tmux panes have not been altered. Additionally, runbooks offer to automatically close their tmux panes when the runbook finishes executing.

Ruby commands

Runbook provides a ruby_command statement to dynamically define runbook statements and their arguments. You can, for example, hit a JSON endpoint to retrieve a list of servers and then execute a command on each of those servers. Because you are working in Ruby, you have access to all the parsing and processing capabilities it provides.

require 'json'

Generators

Runbook provides generators, similar to Rails, for generating runbooks, runbook extensions, and runbook-focused projects. You can even define your own generators for including team-specific customizations in your generated runbooks.

Image for post
Image for post
Help instructions for the Runbook generate command

Adaptability

Runbook is designed to seamlessly integrate into existing infrastructure. It can be used as a Ruby library, a command line tool, or to create self-executable runbooks. Runbook adheres to universal interfaces such as the command line and ssh. Runbooks can be invoked via cron jobs and integrated into docker containers.

Further, Runbook is extensible so you can augment the DSL with your own statements and functionality. The below example aliases section to s in the Book DSL.

module MyRunbook::Extensions
module Aliases
module DSL
def s(title, &block)
section(title, &block)
end
end
end

This flexibility allows you to adapt Runbook to meet any use case you encounter.

Check it out

At Braintree, we use Runbook for automating our app deployment preflight checklists, on-call playbooks, system maintenance operations, SDK deployments, and more. We’ve found it to be instrumental in streamlining production operations, reducing human error, and increasing overall quality of life.

Check out Runbook on Github for more information on how you can use Runbook to streamline production operations and increase developer happiness!

Braintree Product and Technology

Essays on design, engineering, and product development at…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store