Dead Letter Queue Reprocessing

A method for handling input data that can’t be processed is a key feature of modern microservices. In the world of microservices, where each service is independent but needs to communicate with others, the situation where data cannot be processed may arise either due to the lack of access to some component part of the system, or simply because the input data is invalid, corrupt or otherwise unusable.

There are, indeed, a variety of ways to workaround such a situation. In this article we will focus on a deployment where the input data is passed to our component in the form of a message from a queue. The solution to the problem (inability to process the input) is to move the message to a dead-letter queue. However, since it is necessary to re-process this message later, we must also support the re-processing of all failed messages in the dead-letter queue. Our solution is designed in the way that it not only limits the number of re-processing attempts, but also postpones the re-processing of failed messages so as to allow other (potentially not accessible) parts of the system to become available.

dead-letter queue + parking-lot queue

Now, in order to implement the dead-letter queue it is necessary to create, yes that’s right, another queue! This additional queue is called the parking-lot queue. This solution reads the messages from the dead-letter queue periodically but with a defined delay (not immediately after it is received in the queue). By introducing a delay between receiving the message and reading them, we provide time for those (potentially) inaccessible systems to become available. With the solution that we have implemented, we not only delay the reading of messages, but we also implemented a pre-scan check. With each attempt to read a message from the dead-letter queue, we also check how many attempts were already made to process the message. This information is then stored in the message headers. If the number of attempts has not yet reached our pre-defined maximum, the message is simply moved back to the main queue to be re-processed. If the number of attempts has reached our pre-defined maximum, the message is instead sent to the parking-lot queue. By implementing this pre-check and by limiting the number of reprocessing attempts made, we ensure that any message which ends up in the parking-lot queue is properly checked. The application supervisor/technician can then verify the underlying reason leading to the message being rejected. Anything ending up in the parking-lot queue can be checked to determine if the file is malformed, or if it is valid and was not processed because of some other issue in the application or due to the inaccessibility of another component part of the system.

To make our own lives easier, and that of other administrators, it makes sense to configure some sort of monitoring and alerting system for the parking-lot queue, because (in general) each message present in this queue indicates some unexpected behavior of the system (long term unavailability of particular services, presence of invalid input messages, etc.)

The image below provides a general overview of the process:

Implementation using Spring Integration

Since our Java applications are based on the Spring Boot Library and we use the Spring Integration in combination with the RabbitMQ, we choose the same combination of components to implement the solution described above.

dead-letter queue definition

First of all we added the dead-letter exchange and routing-key arguments to the main queue definition:

It ensures that any rejected message will automatically be forwarded to our dead-letter queue.

Setting the rejection of error message

Next, we configured the component so that any exception during message processing will be evaluated by the Spring Integration as a reason to reject the message. We implemented a simple process by setting our own exceptionStrategy (implemented as “reject everything”) to ConditionalRejectingErrorHandler, which we will use as error-handler in the main queue inbound-channel-adapter definition:

dead-letter queue processing

As we intend to process the dead-letter queue with some delay, we need to implement some novel mechanism. A delay can be implemented by using the PollableAmqpChannel instance instead of the standard int-amqp:inbound-channel-adapter:

With this implementation the reading of the queue will be performed periodically by setting a pooler in our dead-letter queue processing chain. Message headers will be processed here as well. We store the number of attempts in the header with the key x-retries. If this header does not yet exist, it will be created and initialized, in all cases its value will be incremented by 1. Next, the system checks to determine if the value is lower than the maximum number of processing attempts defined previously. Based on this check, either the message is forwarded to the main queue for another attempt at processing, or it is forwarded to the parking-lot queue, where it will wait for some supervisor to check it. During dead-letter queue message processing, the header x-death is removed, which indicates that this message has already been rejected. By default this will cause the final rejection of the message by the Spring Integration and the message will not be re-processed at all:

If, for some reason, it is desirable to keep the x-death header (eg. troubleshooting of messages in the parking lot queue), it can be removed when the message is being forwarded to the main queue.

Final words

The implementation of dead-letter queue reprocessing, as demonstrated in this article, proves that robust reprocessing can provide for automated recovery from errors commonly encountered during standard microservice operation. Such a queue, in combination with enhanced monitoring of the queue state, provides: enhanced stability of the application, lower maintenance overhead, lower cost and reduced risk of failure.

An example of this solution is provided in the following github project: