Skip Lambda, Save Data to DynamoDB Directly Using API Gateway; Process Later With Streams

Chris Bailey
Nov 23, 2020 · 9 min read

AWS’ API Gateway allows you to directly connect it to/proxy many other AWS services. This article discusses doing this with DynamoDB, as a way to create an API that adds data to DynamoDB, without needing to use a Lambda function. There are existing AWS docs on using API Gateway as a proxy for DynamoDB, however, as usual, those only cover how to do this in the AWS console. In particular, I’ll show how I set this up using the Serverless Framework (or CloudFormation, as the bulk is really just CloudFormation code), and how you transform the web request’s JSON so it can be directly PUT into DynamoDB. Finally, I’ll talk about how to then do post-processing of the data via DynamoDB Streams.

Image for post
Image for post

But Why?

The use case I have is an authenticated web API that takes in a potentially significant volume of events from mobile devices. This data will be stored in DynamoDB. As an additional constraint, the mobile app is sending via regular HTTP web calls, and doesn’t have the ability to use GraphQL (i.e. AppSync isn’t a possibility for this case). Finally, I want this particular API to be simple and very fast, and all the (time consuming) processing of the data will be done async. Thus, we can simply have the data come in via API Gateway and get injected directly into DynamoDB (with some basic data transformation, and integration of the user’s ID).

This alleviates the need for a Lambda, and avoids the cost of that. Not that Lambda is that expensive, but if this does wind up scaling to say millions (or hundreds of millions) of events per day, then that will be a meaningful savings. Furthermore, this is more maintainable and a simpler architecture, as it’s one less component to build and maintain.

Update 4 Jan 2021

In the full gist (see link below in first sentence of “Show Me Alredy”), I added a CloudFormation resource for AWS::ApiGateway::Deployment as I hadn’t had that in there, and without it, your API won’t actually get deployed!

Update 25 Nov 2020

A quick update since I originally published this story. Ben Duong pointed out a Serverless Framework plugin Serverless Apigateway Service Proxy. However, it doesn’t support DynamoDB’s batch updates, so cannot be used in this case. I’m also not sure on how it handles auth needs. However, if you are simply taking a single event/record into your API, it should cover it.

Show Me Already

A full serverless.yml config file for this can be found in this gist. Ultimately, the bulk of this is CloudFormation within Serverless Framework config (if there’s a plugin I missed, or some more direct Serverless way to do it, let me know!). I refer to line numbers from this gist below. The key parts are:

  • Cognito user pool (optional/may not be needed for your case) and IAM policies
  • DynamoDB table configuration
  • API Gateway API configuration
  • API Gateway VTL mapping template

Input Data

For this example, the JSON body in the POST request to this API looks like the following:

The Interesting Parts

To me, the interesting parts for this whole thing really come down to how to do the VTL mapping template (i.e. take an incoming HTTP request’s payload and transform it to what DynamoDB needs to do an insert, and how to get the Cognito user ID and include that in the data (since all the authentication is happening “automatically” for you via API Gateway’s Cognito integration). Well, and of course how to do this all in code/Serverless instead of via the AWS console.

A Note About DynamoDB Batches

A key thing to note is that we use batch writes for Dynamo. These are limited to 25 items at a time. As such, our mobile clients are limited to sending events in batches of 25. But, the key is that it’s batch, even if it’s a batch with just one event. You’ll see more on this below with the VTL template iterating the incoming events.

Cognito/Auth

First up is Cognito (line 51). If you do not need authentication on your API, you can skip this. There is a fair bit of setup, at the beginning of the resources section to configure a user pool and the policies needed for this.

Next, if you look in the “API Gateway + VTL template to put events into above DynamoDB table” section (line 193), you’ll see a YourProductAPIAuthorizer section. This sets up the use of Cognito user authentication for the API Gateway API.

DynamoDB table

This is standard, and you can find plenty of docs in Serverless or CloudFormation for creating a DynamoDB table (line 170). I recommend checking out the Serverless DynamodB Local plugin as well, which makes it easy to use a local DynamoDB for testing. You’ll see the table creation under the “DynamoDB events table” comment. This is a very simple one with just a single PK (UserID) and SK (TimeUTC), but sufficient for this example. You’ll note that it is configured in full serverless mode via the BillingMode: PAY_PER_REQUEST line.

API Gateway

The meat of things :) This is under the comment “API Gateway + VTL template to put events into above DynamoDB table” in the resources section (line 193).

It starts off with an IAM role setting up what actions API Gateway is allowed to do with DynamoDB, and specifically just for the EventsTable. In this case, it’s allowing 5 actions, with the most important being the BatchWriteItem action, as that’s what will actually do the insert (of multiple events in this case).

Next you’ll see the Authorizer (line 230). You’ll notice in that, the IdentitySource: method.request.header.Authorization which means the API uses the Authorization header. More details can be found in the CloudFormation docs for the API Gateway Authorizer.

Then, the EventsResource (line 253) item, which defines the URL path of the API, events in this case. Thus, the API URL path is /events.

Following that is the real meat of the API, the EventsAPI resource (line 257). This defines the Authorizer to use, the HTTP method (POST, line 265), and then the really interesting part, the VTL template, RequestTemplates, that maps the incoming JSON to a DynamoDB request:

After that comes the IntegrationResponses and MethodResponses . These were a little confusing and unclear on how to set up at first. The IntegrationResponses handles the proxy/request to Dynamo, and is mapping its responses for API Gateway, which then get mapped to the MethodResponses which API Gateway uses for the actual HTTP response.

A few key notes:

  • The RequestItems (line 277) is the root element of a DynamoDB BatchWriteItem operation. As mentioned above, you can have at most 25 individual requests within this (these can be different, e.g. you can mix Put and Delete, although for this obviously we’re only doing PutRequest items). We’re obviously not enforcing this batch size limit here, which is one downside — you’re having to rely on your clients who call this to behave properly. When you do send more than 25, the Dynamo request will fail, and it’ll fail the API Gateway call/return an error. This is something to consider when doing these proxy style API’s, as you clearly get less in terms of how you can handle errors and how you might want to respond in such a case. I believe there is likely a way with the VTL template to potentially map it differently, or maybe immediately return an error if the count of items is higher, but I haven’t explored that yet.
  • A VTL foreach loop (line 279) is used to iterate over the incoming list of events, and map each one to a PutRequest. Note that the incoming events are just a simple JSON array/list of single level of attributes, but if they had nested elements, you’d just use this same dot syntax to traverse deeper as needed.
  • The user’s Cognito ID can be extracted from the $context.authorizer.cliams.sub element (line 283). But, as you can see, this is inserting additional data for DynamoDB that wasn’t part of the original HTTP request’s JSON, as well as showing how to get to the Cognito data.
  • The TimeUTC element (line 284) is a string (in DynamoDB) and the incoming JSON already has it as standard ISO format so it can just be set directly like this. It is used as the Sort Key in this table, so having it in ISO format makes it properly sortable.
  • The rest of the elements are just a straight mapping from the incoming JSON to the value for the DynamoDB attribute. Note that of course you can have different names for the DynamoDB attributes vs. the JSON attributes, such as sensor_name gets stored as Sensor in DynamoDB.
  • Lastly, a subtle one. Note the code #if($foreach.hasNext),#end(line 293). That’s a way to add the trailing comma in after each item in the batch of items for the DynamoDB request. Dynamo is particular though, and does not allow a comma after the last item, which is why we have this wrapped in the conditional (i.e. only add the comma if there will be more items after it). Without this, DynamoDB will fail your request.

Post-Processing via DynamoDB Streams

While not required, as mentioned early on, I am doing asynchronous post-processing of these incoming events. This is handled via DynamoDB’s streams. This setup involves a Lambda function that listens to the DynamoDB stream which provides all events from Dynamo (insert, delete, update, etc.). Thus, in my case, for this post-processing, you do need to filter to just INSERT events.

The post-processing we do takes longer and is fairly involved, and thus I wouldn’t want that being done synchronously on receipt of each of these events (nevermind on a batch of 25 events). Therefore, this architecture creates a very simple API that just worries about storing the raw data. Clients either get the format of that data right or they don’t, which is about the only error they can get from the API. Then later, we process these events (which is more time consuming).

You may be thinking — wait, you said we eliminate the need for a Lambda, but now you have one doing the post-processing. True! But you may or may not need that step, AND, the key here is that you are avoiding doing potentially time consuming processing during the API call (thus creating a slow/long response time for your API). Furthermore, with the streams API, you can fetch up to 1000 records for a single Lambda invocation (vs the limit of 25 on the incoming/batch write aspect). Therefore, you potentially could have 40x fewer Lambda invocations (if you can process all 1000 records in the 15 minute Lambda time limit). That said, the real key here for me was not doing the heavy processing we do during the API call, keeping the API itself very fast and having the fewest possible error scenarios.

An interesting note about this as well is that the way DynamoDB streams work, they are sharded by the PK (primary key), so you can be sure to get events, at least by primary key, in the proper order. i.e. in my case, I’m sure to be processing events for a given user in the order they occur. It is a nice benefit and one to consider when looking at alternatives such as queues and SNS, etc. See this AWS blog article, “How to perform ordered data replication between applications by using Amazon DynamoDB Streams.”

Pros and Cons

Obviously not all your APIs can or should be built this way. But, it’s definitely an interesting ability that AWS has provided. Combining this with DynamoDB streams to post-process these is a great option as well.

The primary cons of this, in my mind, are the limited error checking and data manipulation of VTL templates vs. a full code solution. If this is a public API where you have no control of the clients making calls, that error checking alone may be worth inserting a Lambda. The other con to me is use of VTL templates in general, and testing this. I’ll be the first to admit that this is more difficult to test. That said, I’ve found that this is one place the AWS console is handy, as they have a way to directly test such VTL in this setup. I’d rather have a unit test in my code, but at least there’s something.

The pros for me are about the overall architecture and fast API responses for this particular use case. Due to the heavy processing involved for these events, I would have to do that async regardless. This architecture simply leverages the abilities of the AWS platform, and makes the API itself very simple.

If you have better or different ways to orchestrate this in Serverless, or other suggestions, let me know!

The Startup

Medium's largest active publication, followed by +754K people. Follow to join our community.

Chris Bailey

Written by

NatureQuant CTO/Co-founder. HotelTonight Co-founder. Cyclist, trail runner, espresso & coffee lover, geek, traveler, foodie.

The Startup

Medium's largest active publication, followed by +754K people. Follow to join our community.

Chris Bailey

Written by

NatureQuant CTO/Co-founder. HotelTonight Co-founder. Cyclist, trail runner, espresso & coffee lover, geek, traveler, foodie.

The Startup

Medium's largest active publication, followed by +754K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store