Yet Another Attempt At Faster Builds: Caching DB Schema

Jonathan Perichon
Checkr Engineering
Published in
3 min readDec 22, 2017

Long builds slow down deployments and cause undesired context switching. As someone who strives for developer productivity, this is a pet peeve of mine.
And I’m sure I’m not alone!

Making builds faster has been a challenge for numerous companies and is the subject of countless articles online. Though there have been a few tricks shared over time, there is still more we can do.

Here’s how we made our database setup faster by caching its schema and seed.

The issue

Since tests are added every week, the build time for our monolith grows at an increasingly steady rate.

We dedicated a fair amount time to optimizing the build over the last couple years, doing a few things like:

While these efforts were successful, I struggled to find more quick wins that would improve the actual tests’ performance. That’s when I decided to focus on our build setup instead.

Our CI setup

For context, this is what we are working with:

  • A 4-year-old monolith built with Ruby/Sinatra
  • ~10,000 tests written with minitest
  • 5 data stores (MySQL, MongoDB, ElasticSearch, RabbitMQ, Kafka)
  • We use CircleCI 2.0 with 15 large containers (4CPU/8192MB), running 3 parallel processes on each.
  • We have a few workflow processes as part of our build, including linting with RuboCop, running bundler-audit, and reporting our test coverage to CodeClimate

Surprisingly, about 25% of the build time was spent on creating and seeding the database.

DB setup original timing

In fact, database creation was creating ~75 tables and ~200 indexes, while seeding was loading the whole application and inserting thousands of records in batch with activerecord-import. This was incredibly time consuming and could not be scaled horizontally, unlike the tests themselves.

Caching can be good, sometimes

I tried to bypass our application code and ActiveRecord by using a MySQL dump because the DB setup rarely changes over time and runs for every build. Restoring a DB dump proved to be significantly faster, down to mere seconds.

To bake this into our CI steps, I decided to take advantage of CircleCI’s flexible caching functionality to cache and share the dump file across builds and branches.

The logic was pretty straightforward:

  1. Check the cache for a DB dump
  2. If a dump exists, load the DB with the cached dump
  3. Else, setup the DB like before
  4. Cache the DB dump

Because we want to automatically invalidate this cache if the DB seeds or schema have been updated, I’m using the commit hash for those files as part of the cache key.

This method also allows us to keep the nice interface provided by ActiveRecord to define our schema and seed.

Here is what our CircleCi config looks like now:

And tadaaa!

DB setup down from ~2min to 8 seconds

Special thanks to our friends at CircleCI for having an awesome product, and to all the online community for sharing their insights over time on how we can run our test suite faster!

--

--