There’s a sad time in testing land when you have a test suite that you can’t rely on. Instead of running a suite and merging if it passes, you have to glance over the failed tests and say “Oh that one… that test is flaky.”
Then it becomes a class of tests, “oh we can ignore Elasticsearch tests.” Eventually we stop trusting the build and just merge it. Because it’s probably the Elasticsearch test that failed. At least I think that’s what my email said:
Enter the Huntsman
Obviously we don’t want a test suite that we don’t trust. It’s just a 10-20 slowdown that gives no value if we don’t heed it’s wisdom. If it’s spouting half-truths we’ll need to route them out.
So I decided to hunt down intermittent tests. Qualitative judgements are hard to deal with:
- The test suite is unreliable.
- Elasticsearch is flaky
- My cat is mean to me
While all of these statements might be true, it can be hard to fix them and actually feel like your fix did anything.
Measure Up
So time to measure. The nice property of “intermittent” tests is that they aren’t prompted by a code-change. They just fail seemingly at random. So I created a branch off of our master called…