Enabling live debugging of email pipelines with AWS SES and Google Groups

Jacob A. Hudson
VorTECHsa
Published in
8 min readMar 21, 2022

#aws #devops #sre #python #googlegroups

Intro

One of Vortexa’s goals is to add transparency to the energy markets through the use of data and visualizations in order to bring clarity to the end-to-end energy flows from source to destination. This is done through the retrieval and manipulation of data for a variety of uses cases. While many of our providers have APIs Vortexa can access, some of our more critical providers operate through other means which may be easier for certain data providers (namely attachments in emails). This edition of VorTECHsa dives into how we hardened, modernized, and added internal transparency into our email receiving pipeline, known as ExtEmData (External Email Data).

Existing Tech Stack

To begin, Vortexa is an AWS-native company, meaning we rely heavily on AWS tools to operate the business. One of which is AWS Simple Email Service (SES). While SES has a large and complex feature set, the principal point of interest in our pipeline is email receiving. The way SES receives email is pretty straightforward. After registering a domain that one owns, any emails sent to a set recipient list (either a complete email or a domain) are forwarded to a series of actions (saving to S3, invoking Lambda functions, etc). In the case of the emails pipeline, we have chosen to accept any emails where the domain is extemdata.com, eg anyuser@extemdata.com is completely valid. After an email is received from SES, it is encrypted within SES (client-side) with a KMS key, then forwarded to S3. From there, a Lambda takes the encrypted email, extracts attachments, and stores them within an S3 Key for further processing outside the pipeline.

The example above is the Original Email Pipeline flow, where the following steps happen:

  1. Email is sent to an MX Route53 Domain
  2. Email is received by SES and stored in an S3 bucket
  3. A Lambda is invoked to ingest emails and extract attachments
  4. Attachments are saved within a bucket for further processing outside of the pipeline

More info about SES and detailed instructions on how to set it up can be found here: https://docs.aws.amazon.com/ses/latest/dg/receiving-email-setting-up.html

The Problem

One major issue should become apparent immediately: How can one actually inspect incoming data? Sure, we could direct people to a series of attachments or show them how to download, decrypt the emails as text files, and then parse them. While this is possible, it is far from unintuitive and fairly involved in addition to being reactive (meaning resolving issues interpreting and extracting data, or finding optimizations are very difficult fixes). One example that AWS provides is coverted in the next section, so let’s try it out!

The Solution

One option is to forward the emails to a secure inbox in real-time, which would allow for the emails to be auditable for debugging and verification purposes. How can this be done? Fortunately, AWS has a blog! A post that seems like an exact fit is this: https://aws.amazon.com/blogs/messaging-and-targeting/forward-incoming-email-to-an-external-destination/. While it is a wonderful start, it introduces concepts of the AWS SES SendMail API, and provides the solution; it is still missing a few critical components, mainly that the solution has to be actually viewable in the destination application (this solution just saves the source file as a decrypted attachment, meaning the end user still has to download the file and then re-open it in a supported application…we also now have decrypteed emails with senstive inforamtion on end devices). After reading through the API documentation, I have discovered that the SendMail API calls can cover a variety of use cases (HTML, text, attachments, etc), but managing all of these cases can prove a huge investment for a return which is not guarenteed; for instance what if we have three incoming emails:

  • Email A: A simple text notification with a spreadsheet attached
  • Email B: A complex HTML table
  • Email C: A forward chain with HTML, Test, multiple attachments included, and so-on

Another core issue with the SendMail API is we receive hundreds of emails a day, all different, with a variety of content (and growth can happen at any time). So. What can be done to ensure the most original form is available for analysis, review, and debugging? Enter SMTP

SMTP

In addition to API calls, AWS SES also supports Simple Mail Transfer Protocol (SMTP). SMTP is actually how every email on the planet gets sent through every provider, and the same process is very easy to accomplish when the source email is a file within S3. Let’s take a look at how that works:

  1. The email is read from a file
  2. The email is sent over SMTP

Simple! Or not? The first major catch is science we are now sending emails, we have to own the source email address. How can we change this? Let’s look at this example below

Return-Path: <origin@example.com>
Received: from mail-ed1-f54.google.com (mail-ed1-f54.google.com [1.2.3.4])
by inbound-smtp.eu-west-1.amazonaws.com with SMTP id ...
for target@acme.com;
Fri, 02 Nov 2018 10:25:11 +0000 (UTC)
X-SES-Spam-Verdict: PASS
X-SES-Virus-Verdict: PASS
Received-SPF: none (spfCheck: 1.2.3.4 is neither permitted nor denied by domain of example.com) client-ip=1.2.3.4; envelope-from=origin@example.com; helo=mail-ed1-f54.google.com;
Authentication-Results: amazonses.com;
spf=none (spfCheck: 1.2.3.4 is neither permitted nor denied by domain of example.com) client-ip=1.2.3.4; envelope-from=origin@example.com; helo=mail-ed1-f54.google.com;
dkim=pass @example-com.20150623.gappssmtp.com">header.i=@example-com.20150623.gappssmtp.com;
dmarc=none header.from=example.com;
X-SES-RECEIPT:... ; c=relaxed/simple; s=...; d=amazonses.com; t=1541154311; v=1; bh=...; h=From:To:Cc:Bcc:Subject:Date:Message-ID:MIME-Version:Content-Type:X-SES-RECEIPT;
Received: by mail-ed1-f54.google.com with SMTP id u12-v6so1445426eds.4
for <target@acme.com>; Fri, 02 Nov 2018 03:25:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=example-com.20150623.gappssmtp.com; s=20150623;
h=mime-version:from:date:message-id:subject:to;
bh=...;
b=...
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20161025;
h=x-gm-message-state:mime-version:from:date:message-id:subject:to;
bh=...;
b=...
X-Gm-Message-State: ...
X-Google-Smtp-Source: ...
X-Received: by 2002:a17:906:6c9a:: with SMTP id s26-v6mr6344779ejr.239.1541154310145;
Fri, 02 Nov 2018 03:25:10 -0700 (PDT)
MIME-Version: 1.0
From: Orignal Sender <origin@example.com>
Date: Fri, 2 Nov 2018 11:24:59 +0100
Message-ID: <...@mail.gmail.com>
Subject: Hello World
To: target@acme.com
Content-Type: multipart/alternative; boundary="0000000000003362570579abf3c8"--0000000000003362570579abf3c8
Content-Type: text/plain; charset="UTF-8"I used to be an encrypted email--0000000000003362570579abf3c8
Content-Type: text/html; charset="UTF-8"<div dir="ltr">I used to be an encrypted email</div>--0000000000003362570579abf3c8--

If we only updated the FROM field, the email will still fail. Why? In order for SMTP to work, it has to pass checks including the source value has to be a verified domain. This includes the following fields

  • From (From: Original Sender <origin@example.com>)
  • Return-Path (Return-Path: <origin@example.com>)

Option fields that may have to be changed include:

  • Smtpfrom
  • Envelope-from

If any of these are not set to a domain validated to this SES instance, then the request to send will fail. Additionally, there are some other gotchas which will be covered in the next section, but for lots more info on SMTP in AWS SES, check here: https://docs.aws.amazon.com/ses/latest/dg/send-email-smtp.html

DKIM

NOTE: This Section is a lot of terial and error and might not work given a different tech stack

Domain Keys Identified Mail (DKIM) is a framework for ensuring emails have not been tampered with, among other things. Trying to send an email over SMTP with Multiple DKIM Signatures will result in errors, so why not remove them all? This is the result:

This email looks very readable, doesn’t it?

Through much trial and error, I found removing DKIM signatures containing the following references yields the best results

  • d (Domain) — typically not needed (keeping only the first record found)
  • q — A less common field, but not added to every signature — deleting all matching records

While not perfect, deleting any DKIM signature with the above two criteria has over a 99% success rate. NOTE: Different evaluations may have to be done if you’re not using SES for sending emails or Google Groups for receiving them.

Final Result

The following steps happen within the current Email Pipeline:

  1. Email is sent to an MX Route53 Domain
  2. Email is received by SES and stored in an S3 bucket
  3. A Lambda is invoked to ingest emails and extract attachments, then saves attachments for further processing
  4. An additional Lambda is invoked to make the modifications necessary for forwarding to work and to forward the email onward to Google Groups

Codification

Vortexa’s primary codification tool for shared resources is Hashicorp Terraform. As with most things, Terraform has awesome support with SES, but I have noticed a few caveats. Note of the following order resources have to be provisioned in:

  1. IAM Users/Roles
  2. S3 Buckets
  3. Lambda Functions
  4. SES Infrastructure, within SES
  5. Route 53 Record Verification
  6. SES Receipt Rules
  7. Marking an SES Receipt Ruleset is active

It may help to keep everything within one module while also having everything explicitly sequenced with the `depends_on` field

Some helpful links on codifying SES resources with Terraform are below:

Moving to Kubernetes

While a complete lift and shift of all processes to containerized workloads is not possible, there are some benefits to moving computer workloads to Kubernetes. Principle benefits include:

  • Identical CI/CD processes with other applications
  • More scalability
  • Fewer bespoke processes

In order to see if moving to Kubernetes was even feasible, a POC was conducted, with the results shown below:

Other than moving to compute resources to Kubernetes, the only major other difference is the addition of a VPC-bound Router lambda, all this function will do is to convert Lambda Invocation requests into an API request for each of the endpoints (ingestion and forwarder).

Conclusion

This is how Vortexa allows for rapid, proactive debugging and optimization of its email pipelines. If this, or any other content from VorTECHsa, has piqued your interest; consider checking our open roles at: https://www.vortexa.com/careers

Additional Notes and References

These references may come in handy during your AWS journney

--

--

Jacob A. Hudson
VorTECHsa

Technology Enthusiast | Ingenious Problem Solver | Loves Tackling Challenges