Eliminating Nondeterministic (“Flaky”) Tests in Ruby and RSpec

Published in

Building Panorama Education

4 min readApr 20, 2018

At Panorama, we strongly believe in automated tests. No code gets deployed without passing both a thorough code review and a battery of thousands of automated tests, and no code makes it through code review without updating tests for the new features it’s adding. Automated tests help us build new features and refactor old code without introducing bugs, and are a big part of the reason we are able to confidently deploy new changes to our production apps multiple times a day.

What are flaky tests?

But with an automated test suite comes the dreaded possibility of tests that fail nondeterministically-that is, most of the time they pass, but every once in a while they fail for no obvious reason. When the tests are retried, they pass again. Flaky tests.

Flaky tests reduce developer confidence in the test suite, waste our time when we need to have tests retry until they succeed, and can delay the release of changes or even critical bugfixes. As a result, we take a hard stance and make sure to squash flaky tests whenever we see them.

Common causes of test flakiness

Over time, we’ve found that in our codebase flaky tests tend to have one of three causes:

Cause #1: Tests share state

In RSpec this typically means we’re creating something in our database in a before(:all) block rather than a before(:each) or let block (since we use RSpec’s transactional fixtures, any database insertions or updates that happen in a before(:each)/let are reverted after the test executes).

To track down these instances, we’ve eliminated before(:all) from our test suite from all but a few special instances. In addition, we've added a special after(:all) block that can run after each test file and check whether anything has been left in the database:

# A simplified version of our after(:all) check.RSpec.configure do |config|
  if ENV["DEBUG_TEST_CLEANUP"]
    config.after(:all) do |example|
      ApplicationRecord.descendants.each do |klass|
        if klass.unscoped.exists?
          puts "#{klass} was not cleaned up in "\
               "#{example.class.description}!"
          klass.unscoped.delete_all
        end
      end
    end
  end
end

Cause #2: Tests sort auto-incrementing names

We use the fabrication gem to easily build objects and save them to the database, and fabrication provides a sequence feature that lets you auto-increment fields. For instance:

Fabricator(:student) do
  name { sequence(:name) { |i| "Student #{i}" } }
end# Create three students with names “Student 0”, “Student 1”, and
# “Student 2”
let!(:students) { Fabricate.times(3, :student) }# Test that our results are in reverse-sorted order by student name.
it { is_expected.to eq students.reverse }

But since our tests run in a random order and these sequences are global, the above code could generate students with names "Student 73", "Student 74", and "Student 75", or any other sequence of integers, depending on how many previous tests also called Fabricate(:student).

Since these strings are then used for sorting, we’ll run into problems when crossing number-of-digit boundaries, like with "Student 99", "Student 100", and "Student 101". In that case, the reverse alphabetical sort would be "Student 99", "Student 101", and "Student 100", causing our test to fail.

While we could track down places where we’re relying on this sort of automatic naming and sorting and change the tests, we’ve found it was much easier to globally start fabrication sequences at very high values to avoid this problem:

Fabrication.configure do |config|
  # We change all Fabricator sequences to start at this very high
  # number to help us avoid nondeterministic test failures where we
  # expect two things we're fabricating in a given order to have
  # that ordering when sorting by their names (e.g. "Item 3" < "Item
  # 4") but at digit boundaries can lead to unexpected behavior
  # (e.g. "Item 10" < "Item 9"). Starting at this high number means
  # all sequences will have 89,999,999 iterations before they
  # encounter this digit boundary case, virtually eliminating the
  # probability of nondeterministic test failures caused by
  # unexpected sequence name orderings.
  config.sequence_start = 10000000
end

Cause #3: Tests manually set the `id` (the primary key) of database rows

Doing something like this:

let!(:school1) { Fabricate(:school) }
let!(:school2) { Fabricate(:school, id: 42) }
let!(:school2_student) { Fabricate(:student, school_id: 42) }

might seem innocent, but when the monotonically-increasing id that the database generates for school1 happens to be 42, the creation of school2 will raise an error because two database records can't have the same primary key.

This error can be hidden by more subtle code, like:

context “when a student and school have the same database id” do
  let!(:school) { Fabricate(:school) }
  let!(:other_student) { Fabricate(:student) }
  let!(:same_id_student) { Fabricate(:student, id: school.id) }  it { is_expected.to be true }
end

The easy thing about this issue is that it’s very easy to spot as a red flag in code reviews: your database should set primary key id fields, not your application code.

So if you’ve got a test that sometimes passes and sometimes fails, try checking these three things:

Does the test (or any others) use before(:all)? If so, change that to before(:each) or let blocks instead.
Does the test check the sorted order of strings that contain digits? If so, change the test or better yet configure Fabrication or a similar tool to start these numbers much with much higher values.
Does the test set the id of any model? If so, rework the test so it does not.

If none of those issues are the problem, we’d love to hear about them! And if you’re interested in working on a codebase that takes tests seriously, we’re hiring !