Patrick Blesi
Aug 1 · 7 min read

At Braintree, we like to write tools to automate our work. Our latest tool is Runbook, a Ruby DSL for gradually automating system operations.

I know what you’re thinking: Why build yet another tool to automate an engineer’s job? We already have bash scripts!

First, anyone who has tried writing a for-loop in bash will admit it’s not intuitive (I have to look it up every time!). Second, even when scripting out solutions to common maintenance operations, there are often setup, teardown, and verification steps that are required to ensure the operation ran successfully. How many times have you run into issues forgetting to execute a setup or cleanup step that’s required for your maintenance script? How many times have you forgotten to verify that an operation has succeeded?

We can often mitigate these kinds of issues with good documentation. The problem with software documentation, as we know, is that it can become outdated over time if the maintainers neglect to update it.

How often have you scripted a maintenance operation only to have it become outdated and break six months later? Inevitably, you break out the editor and perform script surgery in an effort to recover from the failed state.

Runbook addresses these types of issues by providing a framework that tightly couples the documentation and code for an operation. It also allows you to progressively automate your operations, finding the right balance between full automation and human involvement.

The philosophy of Runbook is heavily aligned with Dan Slimmon’s Do-nothing scripting and Atul Gawande’s The Checklist Manifesto. It is designed to minimize Toil.

Runbook is not intended to replace more special-purpose automation solutions such as configuration management solutions (Puppet, Chef, Ansible, Salt), deployment solutions (Capistrano, Kubernetes, Docker Swarm), monitoring solutions (Nagios, Datadog), or local command execution (Rake tasks, Make). Instead Runbook is best used as a glue when needing to accomplish a task that cuts across these domains.

A simple runbook

A runbook outlines a list of steps required to perform an operation.

# restart_nginx.rbRunbook.book “Restart Nginx” do
description <<-DESC
This is a simple runbook to restart nginx
DESC
section “Restart Nginx” do
step “Stop Nginx”
step “Wait for requests to drain”
step “Start Nginx”
end
end

It can be compiled and used to generate a Markdown checklist or be interactively executed.

# Restart NginxThis is a simple runbook to restart nginx## 1. Restart Nginx1. [] Stop Nginx2. [] Wait for requests to drain3. [] Start Nginx

Adding automation

Moving past this initial outline, one can start to build automation into their runbook.

# restart_nginx.rbRunbook.book “Restart Nginx” do
description <<-DESC
This is a simple runbook to restart nginx and verify it starts successfully
DESC
section “Restart Nginx” do
server “app01.prod”
user “root”
step “Stop Nginx” do
note “Stopping Nginx…”
command “service nginx stop”
assert %q{service nginx status | grep “not running”}
end
step { wait 30 } step “Start Nginx” do
note “Starting Nginx…”
command “service nginx start”
assert %q{service nginx status | grep “is running”}
confirm “Nginx is taking traffic?”
notice “Make sure to report why you restarted nginx”
end
end
end

Notice that this runbook includes the step confirm “Nginx is taking traffic?”. You can easily put off scripting steps that are more difficult to automate by delegating that step to the person executing the runbook.

Features

Some of Runbook’s features include:

SSH integration

Runbook integrates with SSH using SSHKit to provide support for executing commands on remote servers, downloading and uploading files, and capturing output from remotely executed commands. You can control the parallelization strategy for execution, executing in parallel, serially, or in groups.

Runbook.book “Restart Nginx” do
section “Restart Services” do
servers (0..50).map { |n| “app#{n.to_s.rjust(2, “0”)}.prod”
parallelization(strategy: :groups, limit: 5, wait: 2)
step “Restart services” do
command “service nginx restart”
end
end
end

Runbook supports different parallelization strategies. The above example executes service nginx restart across app{01..50}.prod on five servers at a time, waiting 2 seconds between each execution.

Dynamic control flow

We designed Runbook’s control flow to be dynamic; at any point you can skip steps, jump to any step (even a previous step), or exit.

Runbook saves its state between each step of the runbook, and it can restart from where it left off if an error occurs while executing the runbook. In fact, you can resume a stopped runbook at any point in its execution.

Noop and auto modes

Runbook provides both a noop and an auto mode. Noop mode allows you to verify the operations your runbook will run before you execute it. Auto mode will execute your runbook, requiring no human interaction. Any prompts you have added to your runbook will use the provided default values, or the execution will immediately fail if prompts exist without defaults.

Runbooks can be executed in noop mode to describe what commands the runbook will execute

Execution lifecycle hooks

Runbook provides support for before, around, and after execution hooks. You can alter and augment your runbook behavior by hooking into the execution of entities and statements in your runbook. Hooks can be used to provide a rich set of behavior such as timing the execution of steps of a runbook or the runbook as a whole, tracking the frequency of execution of a runbook, and notifying Slack when a runbook has completed.

Runbook::Runs::SSHKit.register_hook(
:notify_slack_of_execution_time,
:around,
Runbook::Entities::Book
) do |object, metadata, block|
start = Time.now
block.call(object, metadata)
duration = Time.now — start
unless metadata[:noop]
message = “Runbook #{object.title}: took #{duration} seconds to execute!”
notify_slack(message)
end
end

First-class tmux support

At Braintree we live on a steady diet of vim and tmux. Consequently, Runbook provides first-class support for executing commands within a tmux. When specifying your runbook, you can define a tmux layout. This flexible and intuitive interface allows you to send commands to panes by name.

Executing commands in separate panes is ideal for monitoring, commands that require user interaction, or commands that are prone to failure. You can then interact with the command directly, troubleshooting and resolving issues before continuing the runbook.

Runbook.book “Restart Nginx” do
layout [[
[{name: :top_left, runbook_pane: true}, :top_right],
:middle,
{
name: :bottom,
directory: “/var/log”,
command: “tail -Fn 100 nginx.log”
},
]]
section “Setup monitoring” do
step do
tmux_command “watch ‘service nginx status’”, :top_right
tmux_command “vim /etc/nginx/nginx.conf”, :middle
end
end
end

Runbooks remember their tmux layouts between executions. If a runbook stops unexpectedly, it will connect to the existing tmux layout when resumed as long as the tmux panes have not been altered. Additionally, runbooks offer to automatically close their tmux panes when the runbook finishes executing.

Ruby commands

Runbook provides a ruby_command statement to dynamically define runbook statements and their arguments. You can, for example, hit a JSON endpoint to retrieve a list of servers and then execute a command on each of those servers. Because you are working in Ruby, you have access to all the parsing and processing capabilities it provides.

require 'json'Runbook.book "Restart Old Services" do
section "Restart week-old services" do
step do
server "monitor01.prod"
capture "curl localhost:9200/host_ages.json", into: :host_ages ruby_command do |rb_cmd, metadata|
old_hosts = JSON.parse(host_ages).select do |host|
host["started"] < 1.week.ago
end
old_host_names = old_hosts.map { |host| host["name"] }
old_host_names.each do |name|
command "shutdown -r now", ssh_config: {servers: [name], user: "root"}
end
end
end
end
end

Generators

Runbook provides generators, similar to Rails, for generating runbooks, runbook extensions, and runbook-focused projects. You can even define your own generators for including team-specific customizations in your generated runbooks.

Help instructions for the Runbook generate command

Adaptability

Runbook is designed to seamlessly integrate into existing infrastructure. It can be used as a Ruby library, a command line tool, or to create self-executable runbooks. Runbook adheres to universal interfaces such as the command line and ssh. Runbooks can be invoked via cron jobs and integrated into docker containers.

Further, Runbook is extensible so you can augment the DSL with your own statements and functionality. The below example aliases section to s in the Book DSL.

module MyRunbook::Extensions
module Aliases
module DSL
def s(title, &block)
section(title, &block)
end
end
end
Runbook::Entities::Book::DSL.prepend(Aliases::DSL)
end

This flexibility allows you to adapt Runbook to meet any use case you encounter.

Check it out

At Braintree, we use Runbook for automating our app deployment preflight checklists, on-call playbooks, system maintenance operations, SDK deployments, and more. We’ve found it to be instrumental in streamlining production operations, reducing human error, and increasing overall quality of life.

Check out Runbook on Github for more information on how you can use Runbook to streamline production operations and increase developer happiness!

Braintree Product and Technology

Essays on design, engineering, and product development at Braintree.

Patrick Blesi

Written by

I’m a software engineer who works at Braintree

Braintree Product and Technology

Essays on design, engineering, and product development at Braintree.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade