Asynchronous, durable, push-based architecture with AWS SNS, SQS and Lambdas using Serverless framework

Published in

rezdy-engineering

8 min readJun 6, 2019

This article describes an architecture for asynchronous, push-based communication with retries and guaranteed delivery using AWS services including SNS, SQS and Lambdas. We will discuss how we declare the cloud services using Infrastructure as Code with the Serverless framework. We will also look at how we manage different environments, logical references to other cloud resources and resources from other microservices stacks.

The goal of this architecture

Rezdy, as a channel manager, is required to handle integrations with many third party Online Travel Agent systems. Rezdy products inventory is shared with these systems so consumers can check availability and book activities through third party booking software. One of the major challenges we had to overcome was to ensure synchronisation of products, product availability and bookings across many third-party systems when updates occurred directly within the Rezdy platform.

It is essential to ensure that synchronisation occurs without impacting the performance or status of the original transaction regardless of the availability of the target system. To achieve this we need to propagate the update asynchronously and guarantee the change events delivery to each of the consumer systems.

The chosen AWS services

There are multiple options for asynchronous communication across currently available AWS services. Probably the most widely used, lightweight and easiest approach (from an infrastructure setup, configuration and management perspective) is the Amazon SNS pub/sub messaging service with AWS Lambda subscriptions.

“Amazon Simple Notification Service (SNS) is a highly available, durable, secure, fully managed pub/sub messaging service that enables you to decouple microservices, distributed systems, and serverless applications.”

AWS Lambda can be configured to subscribe to an SNS topic to consume published messages. This straightforward approach can be used if the message delivery does not need to be guaranteed, perhaps for some non-critical notifications. However, to ensure persistence, implement a retry policy or provide a dead-letter queue, we need to include Amazon SQS. “Use Amazon SQS to transmit any volume of data, at any level of throughput, without losing messages or requiring other services to be available”

The infrastructure described above is illustrated in the diagram below. It shows an example when an order, created by an Online Travel Agent (OTA1) system, which is integrated with Rezdy, is updated through the Rezdy Booking Software. This update event needs to be propagated to OTA1’s system to synchronize the order status.

Processing a message can be described as follows:

Rezdy Booking Software publishes an event to a topic after an order status update transaction is completed
SNS subscription filtering policy decides, based on the message attributes, if a message will be published to the SQS queue for OTA1
SQS will guarantee the message delivery or persist it to a dead letter queue if the delivery subsequently fails for a configured period or number of delivery attempts
A lambda function, polling messages from the SQS queue, executes the business logic to transform the model and to dispatch the update.

Life beyond Hello world examples

AWS services including lambdas, SNS or SQS provide a good solution to get concepts like this one running without spending too much time with infrastructure setup and configuration. Creating a standalone Lambda in the AWS console is quite simple and relatively straightforward. It is a trivial task to create, edit and immediately test a Lambda directly in the console. It’s also possible to subscribe the Lambda to an SNS or SQS queue.

As we move on from basic hello world scenarios and start using Lambdas more heavily in our services while introducing complex changes, things become more complicated and we need to consider additional aspects of the development process:

How to manage Lambdas CI using automated build pipelines?
How to manage different environment variables and secrets?
How to deploy the Lambdas, topics, queues and other resources into different environments using different regions, roles or profiles
How to handle subscription or trigger references to other AWS resources from different stacks defined in different repositories (different microservices)
How to configure Lambdas triggers from SQS, SNS, etc. and keep this stack configuration versioned with the source code

AWS Cloudformation

An AWS service you may be familiar with is AWS CloudFormation:

“AWS CloudFormation enables you to create and provision AWS infrastructure deployments predictably and repeatedly.”
“AWS CloudFormation enables you to use a template file to create and delete a collection of resources together as a single unit (a stack).”

CloudFormation seems to be a reasonable solution for all the additional consideration mentioned above. For the most part, it is … that is until we actually try to create a more complex template file. The CloudFormation syntax is very verbose, it forces us to define a lot of attributes which could be more effectively generated by conventions or using default/smart values derived from a context based on the defined resources.

Serverless

This is where Serverless comes in useful, as one of several existing solutions to help developers to address template boilerplate code. Serverless is focused on lambdas and their triggers, but we can define any AWS resource by including raw chunks of CloudFormation templates. Serverless also provides a CLI to manage packaging and deployments of a generated CloudFormation stack, and can be used to handle stack updates when we change a serverless template. Let’s walk through the definition of the services for the proposed solution using Serverless.

Defining a Lambda function using Serverless

A service definition, which can contain one or more Lambda functions, is declared in a serverless.yml file.

The first part defines the environment, AWS profiles, lambda configuration and the rest of the essential configuration necessary for deployment. Note the opt variables which are command line arguments passed to serverless CLI to manage deployments for different environments.

A deployment using Serverless CLI to different environments and regions will look like the following:

sls deploy --stage test  --profile test --region ap-southeast-2

Followed by an environment section for service-wide environment variables (function wide variables are nested under function definition). The environment variables are injected into lambdas during the build time. A nice solution for handling secrets and configuration variables is AWS SSM parameter store, we can easily load and decrypt the values (if our CI and access to the deployed Lambda are well secured) using serverless during the build time if we don’t need to reload them dynamically at runtime.

Note that we can create SSM parameters for various environments by dynamically building their path based on the stage parameter passed from the command line.

Let’s move on to the definition of the lambda function itself, including a trigger by an SQS event. The SQS queue resource will be declared later, for now we can resolve its identifier — ARN using function GetAtt from its logical name SqsOta1OrderStatusUpdated.

Referencing resources from other stacks

In the microservice world, we often end up subscribing to a topic, or in general, referencing other resources which belong to a CloudFormation stack defined by another microservice. In such cases, we need to read the output parameters of another stack to determine the ARNs of the resources we are dependent on.

Adding an SQS queue

As Serverless is primarily focused on Lambda functions and the events which trigger them, any other AWS resources have to be written in raw CloudFormation syntax within a resources section. Let’s define the SQS queue.

To change the queue retry delivery settings or to enable a dead-letter queue, see Configuring an Amazon SQS Dead-Letter Queue. An example serverless file in the summary of the article contains a full declaration of the queue with a retry policy and a dead letter queue.

The SNS orderTopic needs to have access to send a message to the queue, thus we need to define the policy for SQS:SendMessage if source ARN equals our SNS topic.

SNS Subscription with filter

The purpose of the SNS topic is to enable multiple subscribers to subscribe to the topic events. To avoid sending messages all the way through SQS and invoking a Lambda just to find out that a message cannot be processed by the target consumer, we can define subscription filters. In this case, we will subscribe to all order update events, but filter only the events when an order status changes and the reseller is OTA1. Only the required notifications will propagate the event to the OTA1 consumer.

Attributes for the filter have to be sent as message attributes and we need to define a subscription filter policy. See Amazon SNS Message Filtering for details.

Using serverless large scale

A large Serverless YAML can potentially be split and loaded from multiple files, which is not very helpful as there is no way to send parameters to a file, preventing the creation of parameterised fragments, reusable across microservices. To achieve this we need to look for extensions of YAML language or some kind of templating engine to preprocess files beforehand.

If microservices become less micro and grow in scale there are some limitations to consider. A couple of references for further reading on this topic can be found below:

Conclusion

The asynchronous, push-based architecture described here has proven to be a robust and versatile solution capable of delivering asynchronous real-time updates to third-party systems without exposing any risk to the original transactions in our production environment.

Also, we have adopted Serverless as a framework to define CloudFormation stacks for Lambda function based microservices as this enables us to:

easily read SSM configuration parameters
declare environment variables and custom variables and easily reference resources from other CloudFormation stacks
use logical references to other resources within a template
deploy to a number of different environments
avoid boilerplate code for lambdas and their triggers (example serverless template has 131 lines versus final CloudFormation template 318 lines)

A nice feature to have with Serverless would be a mechanism for building and including reusable resources or parts of templates. We had to build a custom templating engine to achieve this.

To see the whole Serverless template referenced in this article follow the serverless.yml link.