Scheduling home services at the tap of a button seems simple on Handy’s frontend, but actually consists of a series of orchestrated events on the backend. These events handle everything from sending our professionals a nightly work reminder to sending them a happy birthday message.
At Handy we run on a Ruby on Rails stack with Sidekiq/Sidetiq at the core of this event scheduling system. After working with both of these technologies for almost two years, we have learned some important best practices.
Long running workers slow down deployment time and have less visibility into the workers’ progress than many smaller workers. To solve this problem, we identify the scope of work in a recurring Sidetiq worker and kick-off many smaller Sidekiq workers with a subset of the total workload. Instead of one worker sending SMS messages to thousands of providers, it will be one worker kicking off thousands of workers which send one SMS message each. This is useful during a deployment because it’s easier to wait for a five second worker, than wait for a ten minute worker to finish processing.
Be careful about database contention with this pattern. In a scenario where hundreds of workers attempt to query/update records on the same table, there could be heavy lock contention.
Uniformly Distribute Time-Independent Workers
We identified a category of workers that do not have a strong dependence on immediate runtime. This pattern helps avoid database contention and makes our DB load less spiky because we distribute worker runtimes over a time range (15–60 minutes).
Short-Circuit Kiq Workers
During periods of heavy server load, it can be necessary to turn off a worker for a limited period of time. At Handy, we use a database configuration table to short circuit a Kiq worker and prevent it from processing.
Idempotent Kiq Workers
Sidekiq has built in automated retry error handling which is usually beneficial but can have unintended consequences. For example, if worker code causes a critical error after sending an SMS, it will be re-tried in about 15 seconds. The unintended consequence is a user receiving repetitive, spammy messages. So, it’s important to make sure that your Sidekiq workers make idempotent updates or do not allow retries as a result of a worker error.
These Sidekiq best practices help us run thousands of home service bookings per day. Have questions, or do you run your system differently? Tell us about your Sidekiq best practices.
Do you have a strong opinion about Sidekiq best practices? You might be the type of person we’re looking for. Check out our careers page to see if we have an opening for you!