Retry failed blocks of code

Anton Magids
Cheetah Labs
Published in
3 min readMar 19, 2020

The creation of take2 gem

Photo by Esther Jiao on Unsplash

Every backend developer at some point writes a piece of code that needs to be retried when it fails. The most common scenario is the third-party services integrations. But there are more.

When I joined Cheetah a couple of years ago we had a Rails monolith application where this pattern of retrying failed API calls was very common

class SomeApiClient
NUMBER_OF_TRIES
= 5

def call(url, params = {})
tries ||= NUMBER_OF_TRIES
# call to the external service
rescue Faraday::ConnectionFailed, Timeout::Error => e
if (tries -= 1) > 0
sleep(15)
retry
else
# notify Airbrake here
nil
end
end
end

Developers love to refactor and we had decent test coverage so I jumped on the opportunity to extract some common concepts here.

  1. Retry on specific errors.
  2. Sleep between retries.
  3. Act upon retry.
  4. Act when no retries left.

Somewhere around that time, two services were extracted from the monolith so I asked my direct manager, the CTO, off the top of my mind:

Can I create from this idea a gem and use it in other applications as well?

He answered:

Sure! Let’s do it!

This is exactly what Cheetah’s R&D DNA is. We always encourage the team to innovate and try new things. The Product team will understand the extra story points for an epic and even the roadmap could wait if the team wants to play around with some new ideas and do a POC.

As an engineer on the backend chapter, I was given the opportunity to create open-source just for fun. Well, not only for fun…

Before the “Let’s make it a gem” one should think of solving the existing use cases and then maybe extract the code into a gem. So take2 module was created in our core monolith. All of the desired concepts (1–5) were implemented and the above-mentioned class would look something like this:

class SomeApiClient
include Take2

number_of_retries 3
retriable_errors Faraday::ConnectionFailed, Timeout::Error
on_retry proc { |error, tries| }
backoff_strategy type: :exponential, start: 3

def call(url, params = {})
with_retry do
# call to the external service
end
end
end

Take2

After running in production for a couple of weeks and a few tweaks and fixes, the module, with its isolated tests was extracted to take2 gem.

Available global configurations:

Use the global configurations

class ImportantService
include Take2

def call
with_retry do
# call to the external service
end
end
end

Or partially override them

class ImportantService
include Take2

number_of_retries 4
retriable_errors Faraday::ConnectionFailed
backoff_strategy type: :linear, start: 1

def call
with_retry do
# call to the external service
end
end
end

The take2 was adopted by our teams and implemented in various scenarios and applications. Originally, take2 was designed to retry API calls, but one team, SKY squad, was especially creative with it. Their goal was to resolve an issue where one of our 3rd party services DDOSed us with multiple webhooks resulting in deadlocks.

SKY squad created the following module:

And used it this way:

class Task < ApplicationRecord
include SafeUniqueness
retriable_methods :find_or_create_by, :find_or_create_by!
end

Now they were able to call:

Task.safe_find_or_create_by!(params)

It is very interesting to see how take2 evolving in our codebase and how the teams hack it. I hope to see more contribution in the future to the gem itself.

--

--

Anton Magids
Cheetah Labs

Ruby developer and a Backend Chapter lead at Cheetah Technologies.