Throttle Lambda Invocations with SNS

I recently came across an interesting challenge. A customer has an application that periodically dumps a (relatively large) number of objects into S3. A lambda function is triggered for each PutObject call, in which the function adds data about the object into a CloudSearch index. Unfortunately, due to the high volume of objects being added all at once, lambda was overwhelming CloudSearch. Since the source application placing objects into S3 can’t be tweaked, we needed to somehow throttle the amount of lambda invocations so it wouldn’t impact the upstream services.

One way to accomplish this is by inserting an SNS topic in-between the S3 object puts and the lambda function. I created a proof of concept that does the following:

  1. User uploads some number of objects into an S3 bucket
  2. The S3 bucket is configured to send notifications to an SNS topic. The SNS topic has a Delivery Policy associated with it that is set to send 1 message per second. Thus if 10 objects are added to the S3 bucket, the SNS topic will send 10 messages (one for each object) over 10 seconds to the API Gateway endpoint.
  3. The API Gateway endpoint is a simple Lambda proxy. I needed to put this in front of the Lambda because the delivery policy only works with HTTP / HTTPS endpoints.
  4. The Lambda function behind the API Gateway serves two functions:
  • Before the Lambda function can receive messages from the SNS topic, it needs to confirm the subscription request. This function checks if the SNS topic added the endpoint as a subscriber (which is done in the CloudFormation template), and if so, makes an HTTP GET on the confirmation URL. This should only happen once.
  • Otherwise, this function looks at the event from S3, pulls out the object’s key, then adds an item with the key and timestamp into a DynamoDB table.

To verify it’s working as expect, check the items in the DynamoDB table and their respective timestamp. Using the example from above, if you check the max messages per second to 2, you should see all 10 objects written within a 5 second window (assuming the files were added into S3 relatively quickly).

The Code

If you’d like to run this proof of concept yourself, jump right in to the GitHub repository. Otherwise, I’ll highlight some of the things I learned along the way. Also, if you’re looking for details on running this yourself, I wrote up some installation instructions in the README, including how to package and deploy the template with CloudFormation.

SNS Delivery Policy

The SNS Delivery Policy is not a supported resource in CloudFormation, so this was my first time building a custom resource backed by Lambda. Doing so was pretty straight forward, but truth be told it took me quite some time to get this right.

Take a good hard look at your Lambda function and ensure it’s sending the appropriate responses back to CloudFormation — otherwise your stack may get stuck for a few hours until it times out!

I used the python requests module for ease of use with responding to the CloudFormation endpoint so the stack can continue. Note that it is an HTTP PUT and not a POST!

SAM Limitations

I’m becoming a big fan of the Serverless Application Model the more and more I use it. I did run into a few limitations with this POC which I think are noteworthy.

  1. You can’t !Ref an event you define within an AWS::Serverless::Function resource definition. In my template, the only reason I had to create an AWS::Serverless::Api resource and have my event reference it was because I couldn’t directly reference the API that’s embedded in the event. Defining a separate AWS::Serverless::Api resource forces you to create a swagger file to describe your endpoint, which leads me to my second beef.

2. In the swagger file, you can’t use stage variables to replace the region and AWS account ID when defining the urn. To me this makes this a bit less modular and dynamic. Notice the <<region>> and <<account>> in the URI definition — not ideal, especially since you can pass in things like the Lambda function name. Why not these values too?!

The Result

I created a bunch of zero byte files to test with:

$ for i in {1..20}; do touch file$i ; done

Then I uploaded all of these files into my S3 bucket:

$ aws s3 sync . s3://<my bucket>/

I monitored the CloudWatch Log Group of my Lambda function as I added some print statements in there for easy debugging. Finally I checked my DynamoDB table to ensure things worked as expected:

Since I set a max messages per second to 1, you’ll notice theres about a 1 second difference between each entry.

I made this value configurable by setting it as a parameter in the CloudFormation template. Want to speed things up? Increase the MaxPerSecond parameter to something that works for you.