Lambda Concurrency Limits and SQS Triggers Don’t Mix Well (Sometimes)

Zac Charles

Simple Queue Service (SQS) was the first service AWS launched back in 2004. Ten years later, Lambda was released at re:Invent in 2014. Last year, the two came together when AWS announced that SQS could be used to trigger Lambda functions. Awesome news! But it’s not all sunshine and roses…

Update 8 June, 2019 — I wrote a follow-up post:

Photo by Aaron Kato via Unsplash

As the title says, sometimes SQS triggers don’t play well when you set a function concurrency limit. Specifically, if you set the concurrency limit too low, Lambda can throttle due to taking too many messages off a busy queue. The throttled messages will go back to the queue after the visibility timeout, and if this keeps happening they can end up on a dead-letter queue.

We’ve run into this a couple of times at work when we’ve had to make a pragmatic decision to heavily limit a function’s concurrency. Let’s look at why this happens and what can be done about it.

What’s going on?

When an SQS trigger is initially enabled, Lambda begins long-polling the queue with five parallel connections. As the rate of messages sent to the queue changes, Lambda automatically scales its polling to balance cost and throughput. Polling is scaled up until the number of concurrent function executions reaches 1000, the account concurrency limit, or the (optional) function concurrency limit, whichever is lower.

It’s important to understand that the number of polling threads is not directly connected to the concurrency limit you set, which is why the above problem can occur. Since the problem is most likely to occur with a very low concurrency limit (1–30), let’s walk through a scenario like that.


Imagine you have a function with its concurrency limit set to 3. You create a new queue and set it up as a trigger for your function with a batch size of 10 (the maximum number of messages Lambda should take off the queue and give to a single function execution). There are now 5 parallel connections long-polling the queue.

When a large flood of messages are sent to the queue, you can expect a batch of 10 messages to be picked up by each of the 5 polling connections (50 messages in total). Lambda will then try to invoke your function for each batch, but will only succeed for 3 out of 5. The other two will throttle and the messages will be put back on the queue with their receive attempts incremented (after some retries during the visibility timeout).

This process keeps repeating. The next time one of the previously throttled messages is picked up, it might be lucky and get processed. On the other hand, it could get throttled again. If you have a redrive policy that sends messages to a dead-letter queue, it’s possible some will end up there.

Since Lambda polls the queue for messages before it knows whether it will successfully invoke your function to process them, this can technically happen at any scale, with any batch size, and with any concurrency limit. It’s just much more significant at the low end of concurrency (around 1 to 30).

What to do about it…

Unfortunately, there’s no silver bullet right now. However, the problem can be mitigated through the following actions recommended by AWS:

  • Set the queue’s visibility timeout to at least 6 times the timeout that you configure on your function.
    The extra time allows for Lambda to retry if your function execution is throttled while your function is processing a previous batch.
  • Set the maxReceiveCount on the queue’s redrive policy to at least 5.
    This will help avoid sending messages to the dead-letter queue due to throttling.
  • Configure the dead-letter to retain failed messages long enough that you can move them back later to be reprocessed

Some months ago we were told 3 times the timeout and at least 3 max receives. The advice changing like that is evidence this is just mitigation and not a fix.
It has worked for us, though.

Anything else?

Assuming your goal is rate limiting, there are a couple of other serverless options. Mostly they just involve using (perhaps misusing) AWS services.

For example, if you put your messages into a Kinesis Data Stream and configure the stream as the Lambda trigger, you could use the number of shards and the batch size to control concurrency and the message processing rate.

Another service that offers rate limiting is API Gateway. You could have your messages come in through a Lambda-backed API and turn on rate limiting. SNS could send messages to the API, but security would be a pain.

You could pretend the SQS trigger doesn’t exist and use a recursive function to process SQS messages. It’s got far more moving parts and costs more, but at least you can control concurrency properly.

Lastly, you could try to avoid the reason you’re needing to set the function concurrency so low. Can you scale that weak downstream API/service? Or provision your DynamoDB table higher or use On-demand?

This is a challenging issue that is probably best solved by AWS. Other than avoiding the requirement entirely, I don’t really recommend the above solutions (except perhaps Kinesis).

Do you know of any other serverless solutions? Let me know!


For more like this, please follow me on Medium and Twitter.

Zac Charles

Written by

Senior Engineer at Just Eat

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade