AWS FIFO Queues with Message Groups for Atomic Processing at Scale

How to process multiple objects in parallel with the guarantee that for each unique object, only a single instance is processed simultaneously

Nick De Cooman
SMG Real Estate Engineering Blog
5 min readDec 22, 2021

--

In a distributed, event-driven architecture, many challenges arise when processing large sets of data. One of them is the need for scalability when handling event streams with a substantial throughput. The challenge becomes even more complicated when processing these events involves atomic operations. In this case, we cannot simply process multiple events in parallel.

As an example, let’s say that we are building a service that ingests listings into our platform. The ingestion process involves several steps and requires multiple API operations. As a result, the whole ingestion can take up to several seconds per listing to complete.

Now, here is where it gets tricky: a constraint of our service is that, while we are processing a listing, we cannot process other events for the same listing. The ingestion should be considered an atomic operation, and hence, only a single execution per listing can be performed simultaneously.

At Homegate, we run most of our infrastructure on AWS in a serverless fashion. At the core of this, Lambda functions execute application logic. A common pattern is to use SNS topics to fan-out messages, and SQS for acting as a decoupling buffer between an SNS topic and some Lambda function.

For our ingester service, a logical architecture could look like this:

Listing events are published to a dedicated SNS topic. An SQS queue subscribes to this topic and consumes the events. Here, the events are buffered until a Lambda instance removes them from the queue and processes the corresponding listing.

Under normal circumstances — in which the processing would not involve an atomic operation — this architecture would be a good fit. It would scale automatically relative to the number of Lambda instances that consume events from the queue, without much additional configuration.

However, in the case where we cannot process two events for a single listing at the same time, things become trickier. If we just allow multiple Lambda instances to consume events from our queue, a single listing might get processed twice exactly at the same time.

A simple yet effective solution could be to limit the number of Lambda instances that can consume events from the queue. If we only allow a single Lambda instance to remove events from the queue, only one listing can be processed at a time. Solving concurrency by simply avoiding it. 🤷🏻‍♂️

Compelling as it is, this strategy comes with a considerable performance cost. Since only one single listing can be processed at a time, it is easy to see that this creates a bottleneck. We need a better solution for services requiring a high throughput, like our ingester.

Introducing AWS SQS FIFO queues

AWS SQS FIFO queues are like regular SQS queues, but — as the name suggests — they also guarantee a first-in, first-out order. Additionally, they ensure that events are received exactly once (without duplicates), which is not the case with regular queues.

Perhaps a less known feature of FIFO queues is the ability to assign a message group id to each of the incoming events. Events with the same message group id belong to the same message group. Within this group, events are always processed one by one in a strict order relative to the message group.

Back to our ingester service, where we can take full advantage of message groups: by replacing the regular SQS queue with a FIFO queue, we can assign the listing id as the message group id for each incoming event. This ensures we process only a single event per listing simultaneously.

Given this guarantee, it is now safe to scale up the number of Lambda instances consuming the queue. As long as a Lambda instance is busy processing an event from a specific message group, no other Lambda instance can consume an event for the same message group. Since a message group represents a listing, this behavior is exactly what we need to make sure that only a single event per listing is processed at a time.

With more Lambda instances now active, we solved the earlier bottleneck and scaled our throughput relative to the number of active Lambda instances.

Show me the code

Transitioning from a regular SQS queue to a FIFO queue is straightforward. With CDK, all it takes is adding .fifo as a suffix to the name of the queue and setting fifo: true while initializing the queue.

const fifoQueue = new Queue(this, 'IngestQueue.fifo', {  fifo: true})

To assign a message group id to an event, SQS provides a MessageGroupId property when sending a message to the queue. In the example below, we use the listing id as the message group id.

import AWS from 'aws-sdk';const sqs = new AWS.SQS({ apiVersion: '2012-11-05' });
const listing = { id: '784c6d44', ... }const sqsParams = { MessageBody: JSON.stringify(listing), MessageGroupId: listing.id, QueueUrl: fifoQueueUrl};
await sqs.sendMessage(sqsParams).promise()

In conclusion, AWS FIFO queues are an excellent addition to the AWS toolset. They enhance messaging between applications in case the order of events is critical, or when duplicates can not be tolerated. In our example, we greatly benefited from the message group feature provided by FIFO queues. By using the listing id as the message group, we managed to process multiple listings in parallel, while at the same time having the guarantee that only a single Lambda instance per listing is running simultaneously.

References

--

--