Spin Cycle — Automating the Tedious
An orchestration framework to automate anything
Heads up, we’ve moved! If you’d like to continue keeping up with the latest technical content from Square please visit us at our new home https://developer.squareup.com/blog
Spin Cycle makes it easy to automate complex infrastructure tasks. The database team at Square uses it for most of our day-to-day operations — provisioning new clusters and decommissioning old ones, upgrading MySQL and Docker, stopping and starting database hosts, and more. We’ve been developing and using Spin Cycle for over two years now, and today we’re happy to announce its GA release.
Scripts Don’t Scale
Before Spin Cycle, databases at Square were managed by hand. This worked, but it wasn’t ideal. For example, we had a script to provision a new database cluster. This script had been used for a few years, and for the most part it served its purpose. The original engineer who wrote it designed it well enough — it was a fully fledged program with functions and classes and unit tests.
The script became a problem when other engineers needed to update it. It was difficult to add new tests when not familiar with all the code, so inevitably a lot of untested changes accrued. When the script broke it was hard to fix and even harder to make sure your fix didn’t break any of the untested bits. The script was slow, running its steps serially instead of concurrently, because that was easier to develop and debug. Over time, an important business process became embedded in this tangled ball of code. Provisioning a new database wasn’t a series of orderly steps anymore.
Even if our script had been perfect — tests for every new change, each step totally encapsulated — we would have had problems. To provision 10 databases, you had to run the script 10 times in 10 windows — hopefully an error didn’t get lost in the noise. You couldn’t see the arguments someone else passed in and what output they got back, let alone look at past runs. One person would run the script and it would fail, another would run it and it would succeed — but did they really pass in the same arguments, run the same version from the same environment? Results weren’t reproducible, because it was practically impossible to ensure all starting conditions were the same.
Scripts don’t scale. They’re difficult to maintain and collaborate on. They’re a pain to run in bulk. They can’t be used for service-to-service automation, and they don’t expose functionality to engineers who don’t have access to the hosts they run on. We needed a new way to automate our database management tasks, and Spin Cycle was our solution.
Spec and Spin
In Spin Cycle, those scripts are replaced by requests, each made up of a series of jobs. Each job is very small and does just one thing (think powering off a host, or starting a MySQL instance) but when executed in sequence they accomplish a large task.
Jobs are provided by you, written in Go, implementing a small, well-defined interface — a job can do anything you can code it to do. Requests are spec’d out in YAML using a simple syntax. Once you’ve given Spin Cycle its jobs and requests, it’s ready to go.
Not Just Another Job Scheduler
Spin Cycle’s got a lot going for it:
No Dark Corners
End-to-end status capability is a first-class component of Spin Cycle. You can see the real-time status of every job in a running request. And, you can look back at a request at any time and tell precisely what happened when it was run, because Spin Cycle saves a log entry for each completed job.
When starting out with Spin Cycle, you’re going to be writing all of the jobs in your new requests from scratch. This can be a large up-front commitment, especially if you’ve already got a script lying around that does the same thing. However, when you write new requests in the future, you’ll be able to reuse all of the jobs you’ve already written — no copy-pasting required. This makes it faster to create more requests later, and easy to keep all of them up to date. That’s also why it’s important to make jobs distinct units of work — you can reuse a job that makes MySQL read-only, but it’s more difficult to reuse one that makes it read-only and disconnects all clients and shuts it down, in one fell swoop.
Requests at Runtime
Spin Cycle parses a request into its actual sequence of jobs when you run it, not when you write it. It’s possible to vary the jobs in a request based on conditions at runtime, so you can write a request ahead of time without knowing all the details of how it will be used. For us, this means things like writing a single request that can upgrade any MySQL cluster, regardless of the number of nodes in that cluster.
Fully API Driven
Spin Cycle is designed to work as one part of a larger automation system, so it plays nice with other services. Everything is done via the API—starting and stopping requests, checking progress, looking at logs. Even the CLI is really just a wrapper around an API client. That means it’s easy to kick off a new request or check the status of one that’s running from other code.
Highly Available + Scalable
Spin Cycle automatically moves in-progress requests to other instances when it shuts down. This means you can upgrade versions and deploy new jobs and requests without downtime, as well as add and remove Spin Cycle instances as needed to scale your request capacity.
If you’ve got something you’d like to automate, check out the Spin Cycle documentation for a more detailed overview of how it all works. We’ve provided ready-to-go Docker containers in the Github repo, so you can test-drive Spin Cycle in a dev environment with a single command — go here for instructions.