Background Job and Queue — Conceptual and Practical

Tuan Nguyen
The Startup
Published in
4 min readJan 27, 2021

At a pharmacy store

Here is what happens at the pharmacy store when an order comes:

  • The pharmacist receives your prescription and goes back to the store to get the prescribed medicine
  • Bring back to you and tell you how much
  • Payment and change (if any and if cash)

Then another guy comes, the pharmacist repeats that process. And another 3 guys come, he still does that one by one. But when 5–10 guys come which might happen a couple of times during a day, he is definitely screwed up.

And if you told him something like “dude, why don’t you just get 1 or 2 assistant guys, so while you are there to blah blah with your customers, they can do other things for you”, then you would have in your mind the idea of queue and background job. In this case, the pharmacist, at the frontline, is mainly responsible for taking the order, doing some little things on the order before sending it to the backline.

In a nutshell

There are 3 main concepts in the following figure:

  1. Queue (or pharmacist) performs 2 tasks, i) receiving requests from services, ii) and forwarding them to worker nodes, First-In-First-Out
  2. Background jobs are the tasks performed at the backline, meaning it does not affect user interaction, i.e. non-blocking user activities.
  3. Worker nodes which can be the separated processes, services, etc. and are the actual entities processing the requests, a.k.a background jobs
Simple queue model in pharmacy example
Simple queue model in pharmacy example

Do I need?

Yup, if your system performs

  • Write heavy operations, like logging, tracking, using queue and background job model helps reduce high workload at the database.
  • Read heavy operation for reporting with few requests but takes time for synthesizing or accumulating data.
  • High service latency due to many reasons, i.e. slow network connectivity, during the peak time, it is always better to respond to the user like “sorry dude, we are still processing your request, a bit slow, please patience, ok?” than do nothing or at least, you have a chance to serve more users.
  • Interact with external services but not that critical, like system history collecting, sending an email, updating information to/from other sources.
  • Independent jobs at the queue, so that the system can be scaled by adding more worker nodes.

Do I really need it?

No, while it seems to improve system performance, it is straightforwardly not a good idea to use (I say straightforward, meaning there are exceptional cases for some specific situations) if your system performs:

  • Read heavy operation but for other activities, i.e. reading posts, product lists. For the systems, the performance is typically optimized in (many) different methods, i.e. caching, scaling, distributing which are not in the scope of this post.
  • Critical, important requests in a short time, for instance, those related to payment, reservation, checkout.

In practice

Tracking system: The client sends a request including user information in terms of browser, logged in/out time, IP, etc. to the tracking system to record. The system then pushes that information into Redis queues first, then updates with an OK response back to the client. After that, worker nodes pull data from the queues and write it down to the database. For the system serving a large number of requests, as a result, having a huge amount of data to record, adopting queue, and background job model helps to reduce high workload at the database.

Logging system: Upon receiving a request to API endpoint as the action of end-users, e.g. login, logout, make an order, etc., the system will trigger events that need to push to the queue to avoid blocking users.

Notification system: It is responsible for sending updates, SMS, email to the users. This is the additional task apart from the main tasks flows, not significant and using external services, therefore should be pushed to the queue and processed one by one.

Analytic system: This is for reporting purposes. The number of reports might be just a few, but each of them needs a lot of time to process for the output. With a queue, users will receive a message saying that the reporting request is being processed and will get the updates once the workers finish the jobs.

In fact

It is not that hard to get the queue and background job principle. But it is somehow challenging when implementing it in real systems, regarding some issues as following:

  • Job tracking: You need to track job status as details as possible. Why? Because unlike other issues like gateway timeout, API unauthorized, whose issues can be easily figured, issues caused by some malfunctioning background jobs are in the background and sometimes need advanced skills or a bit of experience/feeling to detect.
  • Error handling: If you can track the background activities, your next task is to handle errors, if available, to decide whether or not the job should be retried, the system should be rolled back, as well as the in-charged person should be notified.
  • Retryable job: This is a requirement to the background jobs, which enable them to be able to retry, meaning that there should not be data duplications or redundant data, no matter if there are errors or not.

Acknowledge

I would like to send my big thanks to Quang Minh (a.k.a Minh Monmen) for the permission to translate his original post.

--

--