Moving to Continuous Delivery

Ian Fuller
Trail Blog
Published in
6 min readMar 26, 2016

--

Software engineers are, by their very nature, cautious. It goes with the territory. Even with the most evolved architectures, engineers’ choices can have a lasting impact. So moving to something like continuous delivery felt a little daunting — isn’t it risky to deploy software more often?

What Is Continuous Delivery

At Trail we believe continuous delivery means building, testing, and releasing software more frequently, with little effort or risk.

It’s not that we were releasing infrequently (on average once or twice in a two-week sprint), but we were definitely excusing inefficiencies in our process. After being inspired by David Farley at London Tech 2020, the man who literally wrote the book, we decided to give continuous delivery a second look.

Where we started

The first thing we did was review our current release process. It was important for us to understand any pain points to be sure we’d removed them. Some of our biggest challenges were:

Late validation

The quicker an idea can move from planning into a customer’s hands, the more quickly this feedback can be used in future planning work. Whenever we had features waiting to be deployed, we were missing out on valuable insight.

Loss of context

Context switching is expensive. Even more so when the task at hand is complex. For each of the four or five features an engineer might release in a two-week sprint, only the last one would be fresh in their mind. This led to a shallow understanding of the technical changes being made and poor recall on how to validate the release.

Post-deployment issues

Accepting the fact that software = bugs, we understandably have occasions where a release causes problems. When a release has multiple features in it, fixing the issue could mean debugging your entire stack.

Manual process doesn’t scale

Our manual release process was getting bigger. We were introducing steps to cover a number of edge cases and working hard to make sure everybody followed them. As the team and product grow, this would only get harder.

Isolating user-stories is hard

We work incredibly hard to reduce our stories to the smallest piece of deliverable value. This doesn’t always provide the best UX though. Sometimes stories have to be released in unison which means parking features, exacerbating the other issues.

How we changed our process

The biggest benefit so far has been removing inefficiencies in our release process. Adopting CD at Trail has forced us to optimise our release process. You can’t ignore difficult release steps if you’re doing them ten times a day.

“#ContinuousDelivery helped @trailapp optimise their release. You can’t ignore problems if you release this often!” https://blog.trailapp.com/moving-to-continuous-delivery-a430287289f#.m6r541rn2

A single branch (master)

To support CD, we moved to a single branch. Any changes we make are raised as a PR against master. We don’t have develop or release branches.

Solid unit, integration and feature test coverage

This wasn’t a big change for us as we were already in a strong place. It did highlight the importance of our test coverage though, and help underline the value a solid testing suite provides.

We ditched staging for preprod, review apps

With small module changes going out iteratively, there’s no longer the need for a staging environment. Our preprod environment is almost identical to our production environment — helping catch environment-specific issues.

Highly automated workflow

We’ve combined this preprod environment with Heroku review apps, to help replace the exploratory step that staging provided. Whenever a new PR is raised, Heroku automatically spins up a review app at my-trail-pr.herokuapp.com, allowing us to validate our changes in a production-like environment.

The process of moving from PR through to production is now fully automated.

  • Review apps are automatically spun up and seeded with test data whenever a PR is raised.
  • Preprod is automatically deployed whenever a new commit is pushed to master.
  • Preprod is automatically seeded with data from the production database after each deployment.
  • The deployment from preprod to production is done with a single click using Heroku pipelines.

Slack integration

Although the process of moving from PR through to production is highly automated, humans are still involved (if not for long). This means that preprod is locked during a deployment, which we communicate through the magic of Slack:

  1. CircleCI notifies the #developer channel whenever a new commit has been pushed to master, advising others to hold off.
  2. When preprod is ready, another message goes out, encouraging developers to review their changes, and open up the pipeline.
  3. As soon as a release has gone into production Heroku sends the final notification to Slack, and the process starts again.

Additive deployments

One of the biggest challenges with CD is managing changes which have large inter-dependencies, like database migrations. One solution to this is to make additive changes. Instead of modifying a database column — you add a new column, make your changes against that and clean up in a later deployment. There’s a certain amount of fallout from this, but in our experience it massively de-risks deployments and is worth the additional development work.

How we changed our thinking

Deploy components, not features

This is one of the most difficult changes, as it requires more discipline. However, once we got over the initial hump we’ve actually found it easier to make and deploy smaller changes. One of the biggest benefits has been the small PR size. Smaller PRs are much easier to grok and much quicker to iterate on.

No more CHANGELOG.md

We used to use a changelog file to track our feature changes and fixes. We’ve now started using commit messages to document our releases. This has been a little tricky and we’re still feeling our way through. Currently, we’re marking non functional commits with ‘[minor]’ and trying to make our other commits as readable as possible. This has already left us with a clearer commit history, and easy to digest log

Using release toggles

Releasing individual components and maintaining a coherent UX can be difficult. Equally, you may want to hold entire features back users until they can be released in unison, or until they’ve been validated against a specific cohort. To achieve this we’ve used release toggles. Simplistically, this is the process of if’ing out a feature’s code path. We’re taking a fairly crude approach to this at present but its so far been successful for the team size and burn rate.

Health metrics, not manual QA

No matter how much testing you have in place, you’re never going to capture the infamous unknown unknowns. One way to defend against this is to build monitoring and alarming around user metrics. If your sign-in rate drops below a certain threshold, for example, you might have an issue on the sign-in page. These metrics might not pinpoint the exact issue (going off on national holidays, for example) — but they help capture things you might otherwise have missed.

Some of our user journeys don’t have the traffic to make this approach viable, so we still use manual QA. As the product grows however this, along with canary testing, will give us additional confidence in our release process.

Our experience so far

So, is it risky to deploy software more often? Only if each release introduces risk beyond the changes you’ve made. We’re working hard to separate the cost of deploying software from the cost of writing it.

“With #ContinuousDelivery @trailapp has been deployed as many times in the last few days as in the previous year!” https://blog.trailapp.com/moving-to-continuous-delivery-a430287289f#.m6r541rn2

We’ve only been trialling this process for a couple of weeks now but we’re already feeling the benefits. In the last few days, we’ve deployed our product 20 times, this is equivalent to 10 months in our previous process!

--

--