Serverless: Lessons Learned using DynamoDB x MongoDB x Aurora

Cleriston Bernardes
Oct 16 · 6 min read
Image from Amazon documentation

Few weeks ago I wrote an article where I highlighted some issues I had whilst implementing the AWS Cognito User Pool with Serverless. Moving forward, today I want to describe some challenges I found while choosing and using a DB with Serverless architecture, also, approach some important lambda concepts which can impact decisions made for a serverless project.

DynamoDB

If you have considered to use Serverless and have read some AWS’ articles or documentations, it is very likely you have found a suggestion to use their dynamodb service to persist data. Even though this database has several benefits, it may not be the best fit for all application’s type. Let’s consider some important points for an average sized API:

From the content organization perspective, in dynamodb the tables are not aggregated as most of the common databases. Because they are completely independent, even though they can be separated in different VPCs, all the tables will be shown in a mixed list. Imagine having two projects and they both have a table called “message”. How should we name the tables? prj1-message-prod and prj2-message-prod? It looks very confusing.

AWS DynamoDB table list

Moreover, because they are very self-contained, there is no support for join operations. Not even a “populate”. Let’s consider a simple query to list all messages of a given user. To do so, it would be necessary to design this info inside user’s table schema or run two queries. Now, image user-related info (comments, transactions, logs, orders, etc) escalating as the project grows, how big the structure and table would become?

From the query perspective, the syntax does not offer much of help as well. It is not intuitive, laborious to write (It could have been because of our lack of experience) and the ORMs available are not very supportive.

Consequently, due to the factors above, we decided to consider MongoDB as a database for our serverless project. Still, from different reasons, the tool could be a good fit for your project, hence, I would suggest the Ultimate Guide of Serverless and DynamoDB reading for a deeper understanding.

MongoDB and Lambda: The paradox

As we moved forward, all the above issues were easily fixed (I guess due to the previous experience), the problem though is that mongodb is a connection driven database. How is this a problem? A lambda is a self-sufficient group of code executed over and over every time it is invoked, which means it needs a database connection every time it is invoked. This will increase the method execution time and decrease its performance as we need to connect and disconnect from the database. To understand it better, lets break the problem into small concepts:

Cold Start is the period the lambda takes, on its first request, to prepare all the dependencies necessary to execute its code.

The problem here is that on top of the cold start (launching a container, loading modules, configurations, etc) the database connection will also delay the code execution. Bear in mind that, the db connection is not part of the cold start since it is part of the lambda code execution, however it has a big impact on getting our main method ready for execution and, technically, every time the lambda has a cold start it will have to do the connection as well, so they are correlated.

After understanding that, why didn’t we have this problem when using dynamodb? Because dynamodb uses API requests to write and read items from and to a table, whereas mongodb relies on DB connection. In other words, there is not a persistent network connection, even if an AWS SDK or AWS CLI is being used. As a result, in order to increase our performance, we will need to use the same DB connection every time the lambda is re-invoked .

DynamoDB uses API requests to write and read items from and to a table, whereas MongoDB relies on DB connection

Event loop is what allows Node.js to perform non-blocking I/O operations - despite the fact that JavaScript is single-threaded - by offloading operations to the system kernel whenever possible.

Why is this information important? Because the DB connection is a working pool and the lambda does not return the callback until all the operations from the runtime stack are finished, therefore, the lambda will fail by timeout if the connection is not closed.

If you are following until now, what we have is: To avoid performance issue during access to the database it will be necessary to keep the connection open, but to have a lambda response returned through the callback the connection needs to be closed. We have a paradox.

How it is solved?

Configuring the lambda to not wait the event loop runtime be empty by resetting a context config option.

context.callbackWaitsForEmptyEventLoop = false;

“By default, the callback waits until the runtime event loop is empty before freezing the process and returning the results to the caller. Setting this property to false requests that AWS Lambda freeze the process soon after the callback is invoked, even if there are events in the event loop. AWS Lambda will freeze the process, any state data, and the events in the event loop. Any remaining events in the event loop are processed when the Lambda function is next invoked, if AWS Lambda chooses to use the frozen process.” Reference from mongodb best practices.

Now that we have an understanding on how things work from the lambda perspective and db connection, three things have to be considered in order to implement the solution:

  1. The connection must be out of the lambda handler. This is the code that will be freezed.
  2. You should check if the connection is already open before trying to connect to the DB again.
  3. Inside the lambda handler set the “callbackWaitsForEmptyEventLoop” to false.

Here is the code

As we can see in the example above the overall code is not complicated, but it makes more sense when there is an understanding of why these configurations are necessary. If you have been working with MySQL, there is a very good example on Jeremy Daly’s Blog.

The unsolved problem

Considering we have completely isolated our connection problem for each lambda, what would happen if the application has 100+ lambdas? Because a lambda is independent, every single one will have its own DB connection, which will cause MaxListenersExceededWarning and potential issues. Although AWS automatically clean the unused (their algorithm or period is unknown) lambda environments, there are few approaches which could help mitigate the problem:

  1. Have more than one endpoint method referencing the same handler, which will result in one lambda created but serving many methods. Example: Imagine the endpoints get: cms/messages/{id}, put: cms/messages/{id} and delete: cms/messages/{id}. Since they have the same structure they could invoke the same handler. Then the handler would route them to different database operations based on their method. The cons will be the need of more resources for the lambda, lack of granular management and maybe extra code/architecture complexity.
  2. Change db configurations, try to avoid too many connections and be reactive when it happens. Jeremy, on his blog, has a great article with suggestions on how to manage the connections.

Maybe AWS should provide some sort of shared lambda resources for this unique issue. It defeats the purpose of lambda’s independency, but it is also something very important for an API development.

The good thing is that Amazon launched, last year (09 Aug 2018) the Aurora Serverless service and, according to Jeremy’s article, it seems to be a good choice. Had I started the project months ago, I would have given it a chance. What is more, this announcements shows that the providers are continuously working on improving their services.

To sum up

It is really hard to pick the best tools or choose the right architecture when implementing Serverless projects, however if the concepts are well understood and taken into consideration, it gets easier to make the right decision.

Credit

I would like to give an extra credit to Jeremy’s blog where I found good references and use cases for the topics I approached. It contains a vast source of info related to serverless.

Resources

https://serverless.com/dynamodb
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SQLtoNoSQL.Accessing.html
https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick
https://www.jeremydaly.com/reuse-database-connections-aws-lambda
https://docs.atlas.mongodb.com/best-practices-connecting-to-aws-lambda
https://mongoosejs.com/docs/lambda.html
https://github.com/awslabs/aws-serverless-express/issues/242
https://www.jeremydaly.com/manage-rds-connections-aws-lambda
https://www.jeremydaly.com/aurora-serverless-the-good-the-bad-and-the-scalable/

JavaScript in Plain English

Learn the web's most important programming language.

Cleriston Bernardes

Written by

Technologist trying to make the puzzle of Cloud, Server, Back End, Front End and Tests easier to be understood and connected.

JavaScript in Plain English

Learn the web's most important programming language.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade