You probably don’t need queues

4 min readFeb 11, 2016

Why do you think you need queues? Ah, you want all that decoupling, scalability etc offered by queues. Alright, let’s take a closer look.

Let’s say you have a service that accepts requests for processing. Processing of each request takes 128 ms, and we have load-balanced it using a pool of 4 nodes each having a 16-core CPU. So we got a total of 64 cores that can process the requests parallelly, giving us a max throughput of 500 requests/sec with a latency of 128ms.

As a buffer

What happens when you flood the service with 2K requests/sec? 75% of them get rejected. So we will introduce a queue infront of our service. This acts as a buffer. Whenever the incoming rate is more than 500/sec, it holds the requests in the queue. And at lean times when the incoming rate is lesser, the service catches up on the queue, emptying it. However, all this works only when the overall incoming rate is not more than 500/sec. Otherwise the queue gets full and it is worse than not having the queue at all. But this is not the point of what I’m trying to say. The queue is still useful in acting as a buffer, preventing connection timeout issues as long as it is not full.

Pipelines are good?

Now you might see that some parts of the request processing are more intensive or time-taking than the other parts, and you identify them as bottlenecks. Naturally, you want to scale up the bottlenecks more than the quicker parts, to get better throughput and latency with minimal resources. Let’s see how this is a fallacy.

So, we see that we can split the request processing into 4 parts — p1, p2, p3 and p4. p1 takes 40ms, p2 takes 10ms, p3 takes 50ms and p4 takes 28ms. Also, we want to utilize the processor cores to proccess the requests in a pipeline model, working on seperate parts of the request processing, instead of having each core running the entire request processing.

Since the throughput of our pipeline is equal to the minimum of throughputs of its parts, let’s allocate more cores to the slower parts to improve their throughput. We will use 5 cores for p1, 1 core for p2, 6 cores for p3 and 4 for p4. The distribution is roughly proportional to the processing time. With that we have throughputs of p1(125/sec), p2(100/sec), p3(120/sec) and p4(143/sec). That gives a throughput of 100/sec per node and since we have 4 nodes, we get a overall throughput of 400/sec.

How about the latency? Since we have a queue in-front of each of these parts, the processing is likely to wait in the p2 queue because of it’s throughput mismatch with the upstream parts. This wait time gets added to the actual processing time of 128ms pushing it to, may be, 150ms.

So, we see the worsened situation both for throughput and latency with additional headaches of maintaining the message broker and fine-tuning our internal queue consumer scaling.

Why did this happen?

First, if the processing parts are sequential, then the best way to run them is, run them on the same core. It doesn’t make sense to split the code specifically to scale up different parts differently. The concept of “bottleneck” applies only to thsoe processing parts that are designed to run on different threads/nodes due to other reasons.

That’s my point. You don’t need internal queues to scale up different parts of your own code. Try to use a single core for the entire request-processing code that you own. If you need more throughput, use a load-balancer with more nodes or a single queue with more consumer nodes.

You know, a load balancer is a kind of message broker with a single queue of max length 1 and multiple consumers.

But there are other benefits

I hear you. You want to use queues for their features of store-and-forward, retry, durability etc. Eseentially all of these are the same concern — isolating the clients from service failure. This is a well justified concern. But this is a service-level concern which requires addressing at the client-service interface, but not at every processing step internal to the service. That is why I menetioned that you need at most one queue per service.

Queues are a band-aid solution to the unavoidable problem of service-client separation. You don’t want to injure youself just for the want of a band-aid.

You probably don’t need queues

As a buffer

Pipelines are good?

Why did this happen?

But there are other benefits

Written by Venkat