How to ship a Rails upgrade and not break everything

A few months ago, Greenhouse Onboarding shipped a major Rails upgrade to all our customers without a significant hitch. No rollbacks, no major bugs, no downtime, no interruptions to other feature development.

As you’ll see below, the incremental development approach we followed is pretty well-documented. So I want to spend most of this post talking about how we shipped the upgrade and the lessons we learned at each stage.

tldr;

Our development approach centered on creating a second, Rails upgrade Gemfile, accessed via an environment variable. From there it was simple to toggle the Rails version when running tests or booting development environments.

The project would have been impossible without a robust test suite, which guided us from start to finish. But as we progressed, we found cruft such as dead code and outdated Rails syntax (when a backwards compatible alternative was already available).

Likewise, we found inconsistencies in our deployed environments once we sat down to plan our release to customers. We could have saved a lot of time by continuously upgrading — i.e., by addressing these issues before putting engineers on the project full-time. Read on for a deeper dive into these lessons!

Two Gemfiles

The dual Gemfile approach makes it easy to develop on multiple Rails versions without interrupting the flow of other engineers on your team (who are working on their own features). So this doesn’t happen!

If a Rails change is backwards compatible with your old Rails version, just develop it normally. There’s no need for a separate Rails upgrade integration branch. But if the change isn’t backwards compatible, you can easily isolate it with a simple conditional. Your code will end up with lots of stuff that looks like this, which you can delete once you ship the upgrade.

And you can run any Rails command in upgrade mode:

We got the idea from this awesome blog post by Luke Francl. And it was cool to learn (after our release) that Github followed the same approach! The linked posts go into more detail and are worth a read.

The project

This upgrade was by far the biggest project I have led at Greenhouse. It required a few other full stack engineers, along with colleagues from our infrastructure, QA and operations teams.

The whole thing took about three months, divided into three stages of about a month each.

Stage 1: Run Rails upgrade locally without introducing regressions

The goal of this stage is to set up tooling and development patterns for the upgrade, without introducing regressions. You want to get the Rails upgrade in the simplest state in which development work can be parallelized.

A developer starts the project by upgrading Rails in a branch, resolving the upgraded Gemfile and booting the app. Keep in mind that these steps will likely take a few weeks, depending on the existing state of your Gemfile. We needed to upgrade several gems in order to resolve our new Gemfile.

You advance from this stage when you can easily switch between Gemfiles while maintaining a passing test suite (for the old Rails version — remember no regressions!). Which brings me to some lessons:

Lesson: A good test suite is critical! Tests guided us throughout this project. They gave us confidence we had gotten the upgrade off the ground without causing regressions. They identified a slew of things to fix as we plowed ahead. And they provided a clear metric to communicate progress to project stakeholders.

Lesson: You don’t need to upgrade code if you delete it! A Rails upgrade sheds light on dark corners of your code base. So it’s a great opportunity to delete cruft. We wasted time trying to figure if the underlying app code for broken tests was even still in use. It felt great when we could prove that the code was dead — allowing us to delete it, tests and all.

Stage 2: Fix all the broken things

This stage was when we did the heavy lifting to deploy a working, upgraded app to a development environment. That means tests passing, assets compiled, code deployed and environment variables set.

We spent most of our time here, and frankly a lot of it was tedious — grepping the code base for deprecated Rails syntax. Which brings me to some more lessons:

Lesson: Write compliant code today! We could have done most of the work in this stage could well in advance of our formal upgrade project. For example, dynamic finders are hard deprecated in Rails 4, but can be easily avoided in Rails 3. And strong params can be introduced in Rails 4 via a gem (they are required in Rails 5).

Lesson: Always be upgrading! A corollary to writing compliant code today. But my bigger point is that frequent iteration toward a major Rails upgrade makes the process less scary. As is, Rails upgrades are easy to procrastinate. After all, they are labor and time intensive and do not bring new customer-facing features. Shipping upgrades would be easier to justify if we did the backwards-compatible parts gradually.

Stage 3: Ship it!

For me, this was the most interesting (and stressful) part of the project. It’s one thing to run a new shiny Rails version in development. It’s quite another to plan step-by-step how to ship and monitor the upgrade (and how to rollback, if needed).

I spent a lot of this phase writing a detailed run book, with some big help from our infrastructure team. We had shipped our last Rails upgrade using DNS-based activation — we’d start with, say, 10 percent of traffic on the new version. As we routed additional traffic, we’d monitor and decide whether to keep going or to roll back and fix issues.

In the meantime, we had developed a much nicer system of canary releases, using cookies that could be toggled per organization. We worked out a plan to set up a new production environment to which we would temporarily send customer traffic. Then we’d upgrade our main production environment, and gradually ship the upgrade — to a demo company, then to Greenhouse internally, then to deliberate batches of customers. The beauty of this approach is that turning the feature on and off was a snap.

The run book and QA-led bug bashes led to weeks of development work. A recurring issue was that we would discover subtle differences among our development, staging and production environments.

So, you guessed it, lessons:

Lesson: Make all your environments prod-like. Just like our app code cruft, we discovered infrastructure cruft as we moved to deploy the release. Maybe we were tweaking the parameters for compiling Rails assets. Or using different architectures for our Redis clusters. The result was inconsistent behavior across environments that cost us time to diagnose and to fix.

Lesson: Follow the run book. Like just follow it. We knew we were ready to ship the upgrade when we finished a few dry runs on staging. Once you trust the run book, just follow it verbatim. You won’t need to keep as much detail in your head. It’s there to make your release smoother and easier to grok.

The release

As mentioned, the big day went smoothly. Once we had signed off on a date, I set up a Slack channel and calendar invite to ensure all the extended stakeholders were in the loop — product managers, account managers and chat support.

We gathered the main project engineers in a war room, and started to rattle off the checklist.

dat checklist tho

The bad news — the release took two days. We discovered a difference in our SOLR settings between staging and production that we had to fix. The good news — we caught this issue early in the release, in a demo environment. No customers were rolled back.

A week following our successful release, we had scrubbed all traces of upgrade code from our repo and were back to a single Gemfile.

Next upgrade

I hope the lessons from our release help you upgrade Rails smoothly. I’d sure follow the approach again of two Gemfiles and no long-running upgrade integration branch. But I’d try to do as much backwards-compatible work as possible in advance of starting the hard charge to release — that includes sweeping your codebase for dead code! This would have easily cut a month from our upgrade. For now, I’m working on a planning doc for our next upgrade, so we can get this early work out of the way.

ps — Greenhouse engineering is hiring!