Treat your Build Pipeline as a Product

How we applied product design principles to iterate our build pipeline to save up to 13 hours of engineer time each day.

Jack "chendo" Chen
Jun 19, 2017 · 8 min read

This is a blog format of the talk I originally presented at Melbourne Ruby in Janurary 2017. A recording of the 20min talk is on YouTube.

Engineers don’t like doing repetitive tasks that involve copious amounts of manual labour, especially performing manual testing. Most teams will eventually come up with some kind of automated testing, and most will have this set up to run automatically when code is pushed.

This approach is generally known as a build pipeline, or an aspect of Continuous Integration (CI).

However, the usability of a build pipeline differs significantly between teams — even within the same company. It’s often an afterthought, and everyone is terrified of making any changes to it in case it breaks. Or nobody really understands how it works because the person who set it up left the company years ago.

Builds get flakier. The build time is starting to hit the half hour mark. There’s not enough build server capacity so your build doesn’t start for another three hours.

Sound familiar?

I know from personal experience that a build pipeline that’s slow and unreliable is a major drag on productivity and morale, which is why we’re trying something different at myDr: we treat our build pipeline as a product.

Why is a build pipeline a product?

I believe that

In this case, a build pipeline provides value to the team by automating tests and decreasing the likelihood of bugs hitting production, and thus the primary users of a build pipeline are the engineers.

Why treat the build pipeline as a product?

Product design principles and processes have worked well for creating quality products. Why not apply them to tooling?

First, let’s take a step back and think about the primary user flow for a build pipeline.

Build Pipeline User Flow

The user flow tends to look something like this:

  • Push a change
  • Wait for the build to complete
  • If it’s broken, figure out what broke and why
  • Fix the problem
  • Push the fix
  • Rinse and repeat until the build passes

With this in mind, let’s look at our first cut of the build pipeline for myDr Go. But first, a quick overview of our stack for context.

myDr Go Stack Overview

  • Rails backend, deployed inside containers to a DC/OS cluster
  • React frontend, deployed to mobile platforms via Cordova. ES6, Flow, GraphQL, styled-components
  • Buildkite for CI

Iteration #1

I decided to focus on integration testing first as it provides the most confidence that all the pieces work together.

We initially selected Nightwatch, a JS integration testing tool as it was popular and in active development, which made sense because the frontend is all JS.

We use docker-compose to boot the backend, frontend, test container, and selenium.

Nightwatch output

There were many problems with this first iteration:

  • It took a significant amount of effort for an engineer to understand what broke and why, as one needed to read the test source to understand what happened before it failed.
  • The failure screenshots had to be downloaded separately to be viewed.
  • The tests had to be run locally to see the JS errors.
  • We were unable to test our core flow of a video consultation.
  • Writing the test itself was painful. Needing to chain expectations and specify manual waits did not help with the usually-painful task of writing integration tests. Worst of all, a function named setValue would actually append a value to a field rather than simply set it, and the issue from 2014 was closed without a fix:

From this iteration, we came up with some goals.

Goal: Decrease effort to understand failure.

Where “effort” is a combination of the time it takes, and the cognitive load required (i.e. how much thinking an engineer needs to do) to understand the failure.

Goal: Make writing tests not suck.

Writing integration tests tend to involve a painful cycle of write, run, waiting, making a small change, and running again.

Iteration #2

With the goals in mind, we went back to the drawing board and did a bunch of thinking to help us reach our goals. A week later, the pipeline looked like this:

Iteration #2 build output

The biggest (and probably most controversial) change was . It was a hard decision, but ultimately Capybara’s maturity and the ease of writing tests won out, and our frontend engineers are comfortable writing tests as they’re straightforward and less verbose.

We added , which was by far the most useful feature in this iteration, as the error in the log was generally enough information for the engineer to fix the problem rather than running the test locally for that information.

We added via Ruby’sTracePoint API which allows engineers to quickly understand what happened before the failure.

We added which allows engineers see what the failure looks like visually without needing to run

We were finally able to automate a full consultation test (including video call over WebRTC) by using multiple sessions in Chrome via Selenium.

Old screenshot of an automated multi-session test that ensures WebRTC video calls work.

We also added the ability to via a console which accepts RSpec/Capybara commands and logs successful commands to a buffer which can be dumped and copied straight into a spec. This allowed me to write a 100+ line comprehensive spec that tests our “happy” user flow in a couple of hours and let me keep most of my sanity.

However, there were still issues.

The actual error and backtrace were at the end of the output (default RSpec behaviour), and we would need to scroll through a lot of output to find the actual issue, making it time consuming to understand the error.

The build notifications were also an issue. Email is noisy enough as it is and generally requires people to check their emails rather than receive a notification. The default Buildkite Slack integration is noisy as it has everyone’s builds, not just your own.

With this, we added another goal:

Goal: Decrease the effort required to know the build failed.

Iteration #3

We used Buildkite’s collapsable build output feature to hide successful tests, and gained test durations for free. See for more details.

Only see output for tests that failed.

We built a bot that sends the relevant person a direct message in Slack when their build fails, if they’re mentioned on Github, or someone comments on their PR.

Targeted issue and build notifications on Slack.
Pull request comment with code context.

Rather than trying to figure out which line caused what output, show it after each step.

Per-step JavaScript console output.

Again, there are still things to improve.

We still need to do manual testing to verify that nothing was broken visually.

Build time was starting to hit the 15 minute mark as we added more test coverage. We run multiple build agents per node and as the frontend tests are very CPU-intensive compared to the backend tests, the tests began failing due to timeouts as concurrent frontend builds on the same host would cause it to grind to a halt. The team could no longer trust the build result.

Thus, more goals.

Goal: Faster feedback cycle

This one shouldn’t require any explanation.

Goal: Must be able to trust results

The lack of confidence that errors were actual problems meant that engineers would assume that any build error they encountered was the fault of a flaky build, not an actual bug they introduced.

Goal: Prevent undesired visual changes

The less manual testing we have to do, the better.

Iteration #4

After quite a bit of research, we came up with the following improvements.

There are many players in the visual regression space, but we needed one that worked well with our RSpec/Capybara stack. fit the bill as it has Capybara support, making it trivial to add to add to our pipeline.

Percy’s delta view where it highlights visual changes.
Clicking on the right screenshot toggles between diff and actual view.
What a spec looks like with snapshots and multi-session helpers.

We use Knapsack Pro to parallelise our test suite across multiple Buildkite agents, and restructured our 4-core build agents so that each host runs 3 normal agents, and 2 tagged with cpu=high. All frontend integration tests are scheduled to run on cpu=high nodes, so we no longer have builds crawling to a halt due to over-provisioning. We also decreased build times from ~15min back down to ~5min.


It’s difficult to measure how much cognitive load these improvements have prevented, and we don’t have enough data points as to how much time we saved, but we can do some rough estimates:

Per failure:

  • Inline steps before failure without looking at test source:
  • View screenshots inline rather than downloading:
  • JavaScript console/errors in output rather than running test locally, connecting to VNC and inspecting console:

Total: (rounded up to 2min)

Per build:

  • Test parallelisation:
  • Collapsed build output:
  • Build notifications rather than ‘polling’:
  • Automated consultation call testing:



  • Interactive test writing:
  • PR/issue notifications rather than ‘polling’:
  • Visual regression testing:

In the last 30 days, we ran 633 builds for an average of Let’s round it down to 31.

If our estimations are correct, then the improved pipeline saves us Even if we assume that all the builds are green, and we only need to do the automated call testing on a release, we’re still saving at least

Let’s assume 20% of the builds fail due to an average of 3 failures. That’s


Applying product design principles to our build pipeline enabled us to critically think about what the goals that we want from it. With this knowledge, we spent a small portion of our development capacity to improve the build pipeline

Give your tooling some love. It’s worth it!

myDr Engineering

myDr Engineering

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store