Kochiku: CI for long test suites

A new Continuous Integration tool from Square

Written by Rob Olson.

Back in the summer of 2011, every request to Square’s servers went through a single monolithic Rails application we lovingly call SquareWeb. At the time, SquareWeb had hundreds of models and controllers, all with tests to cover their behavior. In addition to the many unit tests written with RSpec, we had hundreds of acceptance tests written with Cucumber — some of which used Selenium to open a web browser and step through long flows.

Due to its size, the SquareWeb test suite took between two and three hours to run on CI. If a single flaky test failed, we retried the entire build, waiting hours more. Our engineers stopped running through the entire test suite before submitting code. Not good.

We know Rails tests don’t have to be slow (at least not the unit tests); Corey Haineshas done a great job of publicizing methods for removing the Rails framework from the flow of a test so that only the business logic runs. Unfortunately, we were learning about these techniques after two years’ worth of code had already been written. We had a problem that we needed to address right away, or we were going to have to abandon our test suite.

Introducing Kochiku

In July of 2011, we started working on a test system that would automatically partition our build into distributable chunks. We named it Kochiku, after a Japanese word for “construction.” It works like this:

  1. Kochiku downloads source code from a Git repository and checks out the revision to test.
  2. It extracts a set of test files with a ruby-style glob pattern, such as spec/**/*_spec.rb.
  3. Kochiku uses one of several strategies, such as round robin, to partition the tests into a configurable number of build parts.
  4. Kochiku puts the build parts into a job queue, where they’re picked up by any number of build workers.

We started with a cluster of 10 Mac minis to serve as our build workers. Toward the end of last year, we realized we needed more capacity, so we expanded into EC2. Today we have 225 spot instances on EC2 that serve as build workers. This may sound like a lot, but Kochiku now runs the test suites for over 65 different repositories developed by over 150 engineers who believe strongly in automated test suites. It’s used at Square for Java, Javascript, Ruby and Rails codebases.

Kochiku has been a big boost to productivity for engineers at Square. In addition to acting as the CI server that automatically kicks off a new build whenever someone pushes to the master branch, Kochiku starts a new build for every pull request that’s created and reports the status back to Github using their status API. Engineers can also initiate a build on their topic branch at any time, since it’s important to know whether tests pass before issuing a pull request.

What about existing solutions?

In 2011, there were a few tools around for running a test suite in parallel. Some that we looked at were hydra, parallel_tests, and a Jenkins plugin or two. We evaluated each of them but didn’t get the outcome we needed. Both parallel_tests and hydra were designed to parallelize test suites by using multiple cores of a single machine, or by splitting the tests across just a handful of machines. We needed something that would scale to dozens of machines and support a variety of platforms.

At the time, Travis CI was growing in popularity. We think Travis is a great project, and we realized what we really needed was something like Travis, except built specifically to partition a test suite into many pieces. We considered building the functionality that we wanted into or on top of Travis, but we decided instead to tailor an app to meet Square’s needs without extra cruft.

Had Travis Pro existed back in 2011, we probably could have made that work, but it wasn’t until late 2012 they announced the ability to parallelize test suites with folder partitioning. Folder partitioning is somewhat restrictive, however, because it assumes that you don’t have a single folder with a lot of slow tests, and it limits your options for organizing your code.

The future of Kochiku

Kochiku has some powerful features that differentiate it from other CI software packages out there. Going forward, we want to develop these to their full potential:

  • Tight git repository integration
  • Tight integration with git hosts (currently Github and Atlassian Stash are supported)
  • Ability to observe the file system during the partitioning step
  • The test suite runtime database

We’ve been quietly incubating Kochiku inside of Square for the last two years, and we’re excited to release it as an open-source project today. We’re looking forward to incorporating new ideas and contributions from the community.

The source code is available on Github at square/kochiku and square/kochiku-worker.