Dead-Letter in Cortilia with Sqs, Lambda and SAM Template

Published in

Cortilia Team Blog

4 min readMay 28, 2021

1. Overview

In Cortilia we use an hybrid architecture composed by Serverless and Microservices modules.

The communication pattern play an important role in this hybrid pattern.
To accomplish a reliable, secure and scalable we use Topics and Queues with the services AWS SNS / AWS SQS

2. Pub/Sub Messaging

Publish/subscribe (pub/sub) messaging, is a form of asynchronous service-to-service communication used in serverless and microservices architectures. In a pub/sub model, any message published to a topic is immediately received by all of the subscribers to the topic. Pub/sub messaging can be used to enable event-driven architectures, or to decouple applications in order to increase performance, reliability and scalability.

3. Pub/Sub how does it work?

In modern cloud architecture, applications are decoupled into smaller independent building blocks that are easier to develop, deploy and maintain. Publish / Subscribe (Pub / Sub) messaging provides instant event notifications for these distributed applications.

The Publish Subscription model allows messages to be transmitted to different parts of a system asynchronously. A sibling of a message queue, a message topic provides a lightweight mechanism to broadcast asynchronous and endpoint event notifications that allow software components to connect to the topic to send and receive those messages. To convey a message, a component called editor simply sends a message to the topic. Unlike message queues, which group messages together until they are retrieved, message topics transfer messages with no or very few queues and immediately send them to all subscribers. All components subscribing to the topic will receive every message transmitted, unless the subscriber has set up a message filtering policy.

Subscribers to the message topic often perform different functions and each can do something different with the message in parallel. The publisher does not need to know who is using the information she is transmitting, and subscribers do not need to know who the message is coming from.

4. Consumer application error

It may happen that the messages sent are not processed and the causes can be various:

a) The consumer application fails to parse the message

b) The consumer application fails for network reasons

c) The consumer application fails because there has been an unexpected change of state that causes a problem with the application code

d) etc etc

5. Dead letter queue

All messages that are not processed are moved to a special queue called Dead-letter.

Dead-letter queues are useful for debugging your application or messaging system because they let you isolate problematic messages to determine why their processing doesn’t succeed

6. Can we try to process a dead-letter message?

For some types of application fails such as the parse of a message, even if we try to reprocess it we would have the same error.

In some cases, however, such as network errors, we have a good chance that by reprocessing the message at a later time, it will be processed correctly by the consumer.

7. How can we reprocess dead-letter messages fully automatically?

The solution we have adopted is following the best standard keeping it simple: to move messages from the dead-letter to the main processing queue.

The producer sends a message to an SQS queue
The consumer application fails to process the message
The message is moved from the main SQS queue to the dead-letter
A Lambda function receives the message
The lambda moves the message to the main queue because the reply has not reached the limit
The reply limit has been reached, the Lambda moves the message to the second dead-letter

8. When to reprocess the messages and how many times?

Messages are moved to the main queue when they arrive in the dead-letter and are reprocessed a maximum of a predetermined number of times.

For each message in the dead-letter queue, a delay time is set which causes the processing in the main queue to be delayed and a replay-number attribute which is incremented each time until the predetermined maximum limit is reached.

9. Who processes the messages?

We use an AWS Lambda function to process messages in an Amazon Simple Queue Service (Amazon SQS) queue.

Lambda polls the queue and invokes your Lambda function synchronously with an event that contains queue messages. Lambda reads messages in batches and invokes our function once for each batch. When our function successfully processes a batch, Lambda deletes its messages from the queue

10. Define queues with SAM Template

11. Define AWS Lambda function to process messages with SAM Template

12. How the reprocess function works

13. Conclusion

This model is quick to implement in the AWS world having all the necessary tools already available.

The logics that can be applied to calculate the delay seconds are many and are beyond the scope of this article and for those wishing to learn more I recommend looking for exponential backoff algorithm.

Good “Dead-letter” everyone! :)

Thank you!