Make a failing Sidekiq worker call a method after a specific number of retries

As you may have already read in a previous post about our Sidekiq and Redis configurations, we rely on Mike Perham’s Sidekiq to process background jobs (called workers) for our Rails application. Amongst many good things, it showcases a cool retry mechanism when an unhandled exception is raised inside a worker. Basically, when an exception is raised, it will automatically retry to process the job with an exponential backoff. You can define your own limits of course, even execute a block when that limit is reached. Sadly, there is no way to execute a block after the (n-1)-th retry; the block can only be executed after the n-th retry.

In this article, I will detail how we developed that missing feature for our own needs.

Why

For more than a year now, we’ve been working on integrating Google Android for Work APIs to offer our clients a way to silently deploy apps on Android devices (head here if you want to learn more about that!).

One of the biggest hurdles one will face when working with external APIs is errors handling. For one particular use case, we must send some ID to the API to register an Android device. But here’s the catch: Google’s servers sometimes need a few minutes to accept that ID. That means our request may fail a few times before actually succeeding. Moreover, after a few failed attempts, we can be sure that ID will never be accepted (that means the device cannot support Android for Work), so we would like to notify our team after X retries. After the final retry, we would like to flag that device as incompatible with Android for Work in our database. Here’s the complete flow:

How

Sidekiq enables us developers to add middlewares, so this is how we will do it! First, let’s declare our middleware to Sidekiq:

Now, let’s take a look at our worker:

A few things are happening here, let me walk you through them.

  1. We are declaring a worker that will retry 10 times in case of an exception being raised (line 4).
  2. We declare the function self.flag_device_as_in_error! to execute when the retry count is exhausted (line 5).
  3. perform is the method that is called when a worker is launched (line 7).
  4. threshold_retry_count_for_warn and warn (lines 11 and 15) are methods called by our middleware.
  5. Finally, we include an empty module Sidekiq::RetryMonitoringMiddleware::MonitoredWorker (line 3) that will help identifying the worker in our middleware.

Let’s take a look at our middleware:

The heart of the middleware is the should_warn? method. It relies on two things in order to work:

  • the worker’s type: by including Sidekiq::RetryMonitoringMiddleware::MonitoredWorker in our worker, we can identify it using the is_a? method
  • threshold_retry_count_for_warn: it enables us to determine if we reached the limit on which we must call the worker’s warn method.

So the worker’s warn method will be called on the 5th retry and the self.flag_device_as_in_error! will be called on the 10th (the last one).

Conclusion

And that’s it! We saw how to implement a custom Sidekiq middleware that will make a worker perform a task on the x-th retry, and a different one on its last try.

If you have any question or comment, please don’t hesitate!

This article was written by Christophe Valentin, backend developer at Appaloosa.io.

Want to be part of Appaloosa? Head to Welcome to the jungle.