Building a serverless email delivery service on AWS
Using AWS Lambda and other AWS services to build a cost efficient, highly available email formatting and delivery service with event tracking.
This was just one of the issues we wanted to fix when remaking the email delivery service in one of our teams at Schibsted. The previous app was running on AWS EC2, scaled to 3 instances for availability, using an expensive third-party email sending service and running on a severely outdated version of Node.js. In addition, the service was using queues in a funky way and the event pipeline was overly complex due to various reasons.
The service sends about 4M emails per month, using roughly 20 different templates and maintaining it was a complete mess. We wanted a change!
Lambda lets you run code without any server to manage and scales automatically as needed and SES delivers your HTML to mail boxes everywhere while notifying you of delivery, bounce and open events.
First off, gather requirements. This is a non-exhaustive list of requirements, stemming from the need of the application, the way we work and how we wish to maintain the application going forward.
It needs to be
- Cost efficient, pay only for what you use, preferably no idle cost
- Easy email templating, without manually writing HTML
- Simple API that accepts JSON and creates HTML
- Support for translation and variable substitution
- Infrastructure as Code
- Highly available and automatic horizontal scaling
- Using AWS SES instead of another third-party
- Email events collected and sent to analytics and log service
Given the requirements it was decided to use the AWS API Gateway as a front to the email pipeline, directly putting emails to be sent on a queue, AWS Simple Queue Service (SQS), to be picked up by a Lambda function that transforms the message into HTML and sends it to users via SES. SES would then be configured to send email delivery events to a AWS Simple Notification Service (SNS) topic which would place events on a SQS queue, to be handled by a Lambda function that logs and updates the analytics service.
It would look something like this
Tell me, why all these queues?
- All 200s, no message will be lost by a failing Lambda
- If Lambda fails, SQS will automatically retry
- Can specify lambda concurrency limit, avoiding exceeding SES rate limit
- Message can be inspected in AWS Console
- Engineers loves queues
Picking the tools
What the best tools are the best for the job can be discussed ‘till the cows come home. There are several ways of getting your code to the cloud, and you generally combine it with cloud orchestration to set up the necessary infrastructure. Some tools to manage Lambdas and cloud orchestration are
- AWS Serverless Application Model (SAM)
- Serverless Framework
- AWS Cloud Development Kit (CDK)
Both Serverless Framework and SAM had been used before in the team, but SAM was picked due its close ties to Cloudformation and easy IAM Role management. Terraform is also extensively used in the team, but not to package and upload Lambda functions.
SAM stands for Serverless Application Model and consists of a Template specification and a CLI tool. The template looks like Cloudformation in YAML, and contains regular Cloudformation resources and SAM specific resources like
- AWS::Serverless::Function, definition of a Lambda function
- AWS::Serverless::API, definition of API Gateway
Another choice made in the process was the picking of an email framework. As previously stated, making HTML templates for emails by hand is hell on earth. The two alternatives looked at were MJML and Foundation for Emails 2. MJML was picked as it is just awesome and it Just Works™, even in Outlook. There is also a great plugin to Visual Studio Code to compile and preview as one builds the template.
Let’s build it!
To start with, if you have not already, head over to the AWS SAM installation instructions to install SAM.
API Gateway to SQS
The API Gateway can be configured to integrate directly with a number of AWS services. The most common use case is directly calling a Lambda function, but it is possible to directly publish the request to SQS or SNS, and even transform the payload to directly talk to AWS DynamoDB without any Lambda function. One great example of such is Eric Johnson’s article Building a serverless URL shortener app without AWS Lambda.
Placing a queue between the API Gateway and Lambda executing the request has its up- and downsides. The request will be handled asynchronously and the API Gateway needs to respond to the requester without running the Lambda function. This means basic validation and such needs to happen in the API Gateway and if there are other issues processing the request, the requester would only know by looking in a Dead Letter Queue (DLQ) or similar. API Gateway can of course do basic request validation, leaving that code out of the Lambda function. Other upsides are that the queue can be throttled by the executing Lambda function to avoid potential downstream issues like the SES rate limit. In addition, if there is a temporary issue with the Lambda function, the messages will be kept in queue to be processed later.
Setting up API Gateway to send directly to SQS was not as straightforward with SAM as one would have hoped. There is a proposal in this PR that suggests a rather neat way but as of yet it has not been implemented.
The resulting SAM template with authorizer, DLQ and request body validation in addition to the API Gateway and SQS looks like this
If you are wondering how the authorizer works, you can check out this post.
SQS to Lambda to SES
Setting SQS as the event source for the Lambda function is easy with SAM. Here is also where the IAM role templates come in handy. The below snippet shows configuration for Lambda function being triggered by SQS, and the configuring of SES event destination to SNS via custom Lambda.
SES requires sending emails via verified identities and due to the complexity of setting this up with Cloudformation, it was done manually outside of the code for this app. The Lambda function gets fed SenderIdentity via an environment variable and it contains an email address of the sender.
The Lambda will be triggered by the new messages being pushed to the SQS queue (in fact there is a secret polling going on behind the scenes by AWS).
The code is divided in two parts; one to transform the JSON from the request to HTML via MJML, additional processing with Mustache and translation with i18n and then the second part to send the email via SES. The call to sendEmail contains the mail object with an array Tags. Tags is a way to transfer information from the email sending to the event pipeline, as these are included in the SES event triggered when email is delivered and opened etc.
SES event pipeline
SES provides functionality to set up event destinations to gather insight about what has happened to the sent email. Events are sent for when emails are sent, delivered, bounced, opened, etc. SES can publish the events on AWS CloudWatch, AWS Kinesis Data Firehose and SNS. We picked SNS as it comes without any idling costs unlike Kinesis. The snippet below configures SNS topic and SQS subscriber, the SQS queue itself and the policy to allow the SNS topic to write to the SQS queue. It also configures a Lambda function to be triggered by the SQS queue with a batch of 10 messages.
The Lambda function will be triggered by SES events being pushed as messages on SQS, and will transform the event JSON to something understandable by the analytics and log service. AWS docs provide examples of how these SES events look when pushed to SNS.
There is always a cost focus when building new solutions. How can we build a cost efficient solution? Cost optimization is the fifth pillar in The 5 Pillars of the AWS Well-Architected Framework, alongside Operational Excellence, Security, Reliability and Performance Efficiency. AWS defines the fifth pillar as
The Cost Optimization pillar includes the ability to run systems to deliver business value at the lowest price point.
We try to always consider the tradeoffs between speed and cost, trying to deliver value at the lowest price point, by making conscious decisions about instance sizes and aiming to reduce wasted resources. One goal in this project was to reduce idling and upfront costs, and only pay for what you actually use. This affected decisions like using SNS instead of Kinesis for the event pipeline as Kinesis comes with idle cost per shard.
After tallying up the cost we ended up with the following cost per million emails sent
API Gateway = $3.50
Authorizer, cache policy = $0.15
SQS = $0.80
Lambda, Send mail = $2.70
SES = $103.60
SNS, free to SQS = $0
SQS, batch 10 = $0.60
Lambda, Process event = $0.06
Total = $111.41
The largest cost is SES and sending of emails which is $0.10 for every 1,000 emails you send. The Lambda execution time costs are based on 100ms billing and not the new 1ms billing.
Hopefully this article can inspire you to have a closer look at transforming old legacy apps to a modern and asynchronous flows using serverless components.
It took about 2 weeks to stitch it together, and then lots of time migrating old email templates to newly designed and much better looking templates in MJML. We are extremely happy with the results and it removed a complex legacy setup that was getting impossible to maintain.
PS: We’re hiring and have exciting positions in all our locations across the Nordics and Poland. Check out our open positions at https://schibsted.com/career/.