AWS SQS: Detect slow processing

Oleksandr Hanhaliuk
3 min readJan 30, 2024

Among its many services are Amazon Simple Queue Service (SQS) and AWS Lambda, which can work in tandem to process asynchronous workloads efficiently. However, it’s not uncommon for developers to encounter the issue of slow processing when leveraging these services. In this article, we’ll explore how to detect slow processing in a Lambda and SQS setup by monitoring the ApproximateAgeOfOldestMessage metric and discuss practical solutions to address the potential causes.

Detecting Slow Processing Through Metrics

AWS provides a crucial metric called ApproximateAgeOfOldestMessage, which indicates the elapsed time since the oldest message in your queue was sent. This metric was introduced at 2016 and became one of the most important for SQS monitoring.

This metric is an excellent indicator of how efficiently your system is performing. When you notice an upward trend in this metric, it's time to check your SQS processing speed.

Common Causes of Slow SQS Processing

Several factors can contribute to slow processing of SQS messages. Here are some of the typical suspects:

  1. Lambda Concurrency and Throttling: AWS Lambda has concurrency limits that, once reached, can result in throttling. Throttled lambda functions cannot process messages promptly, causing the age of the oldest message to increase.
  2. FIFO SQS and Lambda Processing Time: For FIFO queues, the processing time can become a bottleneck. Since FIFO queues preserve the order of messages, concurrent processing can be limited, leading to potential delays.
  3. Increasing Traffic and Lambda Scaling Time: Sometimes, spikes in incoming traffic can be processed smoothly, but other times they may lead to increased message age, especially if Lambda’s scaling can’t keep up with the surge.
  4. Errored Messages and DLQ Strategy: Messages that fail processing not only get retried but also add to the age metric. Retries compounded by SQS visibility timeout can significantly affect the oldest message age.

Solutions for Each Challenge

Now, let’s look at the associated solutions:

(I) To bypass Lambda concurrency and throttling issues, you can:

  1. Increase Lambda concurrency limits.
  2. Optimize the batch size for messages processed by the Lambda function.
  3. Refine your message handling code to be more performance-efficient.

(II) For FIFO SQS limitations:
- Organize your messages into multiple messageGroupId to enable more parallel processing through Lambda.

(III) To manage increasing traffic and slow lambda scaling

You have to find out the reason of increasing ApproximateAgeOfOldestMessage :

  1. Utilise CloudWatch Insights to judge the time messages are sent versus when they are ingested. You can do this with the following query:
fields @ingestionTime, @message, @logStream, event.Records[0].attributes.SentTimestamp,
@ingestionTime - event.Records[0].attributes.SentTimestamp as timestampDifference
| sort timestampDifference desc

2. Monitor and optimize the longest Lambda processing times, utilizing another CloudWatch Insights query

filter type = "platform.report" or @type = "REPORT"
| fields @timestamp as Timestamp, coalesce(@requestId, record.requestId) as RequestId, @logStream as LogStream, coalesce(@billedDuration, record.metrics.billedDurationMs) as BilledDurationInMS
| sort BilledDurationInMS desc
| head 9

(III) Handling errored messages

This is usual behaviour you should think twice whether you want to change that. If that so, you can implement custom retry mechanism and custom error handling, with that AWS won’t handle retry and DLQ fallbacks.

Conclusion

While AWS SQS and Lambda offer immense scalability and reliability, maintaining efficiency in message processing requires careful monitoring and fine-tuning. By understanding the reasons behind an increase in the ApproximateAgeOfOldestMessage metric, developers can take targeted actions to ensure their systems remain responsive and robust against various loads and errors.

Take these insights to your AWS environment, analyse the ApproximateAgeOfOldestMessage variant trends, and apply the suggested solutions. By doing so, you'll not only improve your SQS and Lambda performance but also enhance the overall user experience of your application. Happy coding!

--

--