Dealing with 503 errors when testing Elasticsearch integration in Rails

When we were building the quote and availability search platform for hotel application I’ve been working on — we made the switch from Redis to Elasticsearch. Parties all ‘round, until we started running tests in CI, and then it was big-red-Fs all ‘round. And they usually came in the following format:

Elasticsearch::Transport::Transport::Errors::ServiceUnavailable: [503]

Which quickly lead to this:

Searching didn’t taste so good mixed with failing builds

Initially we thought they were random, but eventually we realised that it they tended to occur almost 100% of the time if the Elasticsearch cluster had just recently started up.

As it turns out Elasticsearch is a little prone to error in situations which commands execute rapidly in an automated fashion — say, like the sorts of things that occur during a test. Because of the way Elasticsearch works, it’s possible to send persistence commands successfully, but for some query types to fail in obscure, or unexpected ways.

Thankfully, the solution is nice and easy: all you need to do is make sure the cluster is healthy before each of your specs. We’re using RSpec, so the following snippet appears in our Elasticsearch integration specs:

before do
repository.create_index!
repository.client.cluster.health wait_for_status: ‘yellow’
end

Some notes about our setup: we’re using the elasticsearch-rails gem, more specifically the elasticsearch-persistence gem found inside that package.