At its best, serverless computing can make your life as a developer significantly easier. You have no servers to maintain, monitor or scale, meaning you can focus on your application and not the infrastructure it’s running on. But you also face the reality that serverless presents its own limitations and challenges because of its unique architecture. Nowhere is this more apparent than with databases, many of which integrate rather awkwardly with serverless architectures as a result of their decades-old design, made for a different world.
Most databases, designed for a world where servers were long-running and always-on, expect you to set up a persistent connection to the database server before being able to run queries. With the older pre-serverless way of doing things, this was never an issue. You would setup such a persistent connection once (per instance of your application) and it would be reused for subsequent requests.
With serverless, though, every request is its own ephemeral compute instance. For each instance, spending time setting up a persistent database connection is wasteful, especially when you’re paying money for the amount of time your function runs. Furthermore, because every request needs its own database connection, this also puts significant strain on the database itself. If you have 500 visitors to your site at any given time, that’s 500 connections to your database.
Databases that require these persistent connections were never designed to handle one database connection per request. In fact, most databases actually have a limit to the number of connections they can handle at any given time. The actual limit depends on a variety of factors and although you may assume that a high enough limit should be fine for your expected traffic, all it would take is a simple spike in traffic to exhaust this limit.
When using AWS RDS databases with AWS Lambda, another issue arises. Best practices dictate that your database should be behind a Virtual Private Cloud (VPC) to secure your database from unwanted public traffic. In order to access your database, your AWS Lambda functions would also need to be behind the same VPC. The issue is that Lambda functions behind VPCs can have cold starts of up to 10 seconds, making them almost entirely unsuitable for most user-facing applications. This is an issue that AWS is currently working to mitigate. However, for the time being, it remains a significant problem.
If you’re committing to a modern cloud-native architecture with serverless, it may also be time to look at modern cloud-native database technologies that were actually designed with cloud and serverless architectures in mind. The primary thing to look for are databases that don’t require persistent connections to communicate. Instead, these databases usually expose stateless REST APIs for communication, meaning there are no connection limits and no issues with having to communicate from behind a VPC.
Whether you want to still use a traditional RDBMS or are open to trying newer NoSQL solutions, there are excellent options to consider from nearly every major cloud provider.
Amazon Aurora Serverless (with Data API)
Amazon Aurora Serverless is a variant of Amazon Web Service’s RDS service offering relational databases with a serverless computing model. Aurora Serverless automatically scales to meet demand, automatically resizes to meet storage capacity and automatically takes care of routine maintenance tasks as a managed service.
Aurora Serverless alone suffers from many of the same constraints as RDS. As with other relational database solutions, there is a connection limit constraint that can easily be exhausted given enough concurrent AWS Lambda executions. Aurora Serverless only operates within a VPC, meaning you must also put your Lambda behind the same VPC to connect to your database. This means you face the dreaded 10-second VPC “cold start”.
However, a new feature launched in preview last year changes everything. The feature, called Data API, exposes a secure HTTPS API so you can interact with your database in a stateless manner without having to connect beforehand or maintain any persistent connections. This gives you cloud-native flexibility while retaining your ability to use a tried-and-tested SQL database such as MySQL.
DynamoDB is AWS’s fully managed cloud-native NoSQL database. On paper, it’s the perfect companion to serverless backends on AWS Lambda. You use the AWS SDK to communicate with DynamoDB, which uses an HTTPS API behind-the-scenes, meaning you have no persistent connections to worry about and no issues with VPCs.
However, DynamoDB has some well documented problems. Generally speaking, designing your data to work properly and efficiently with DynamoDB while supporting your access patterns isn’t very straightforward. In addition, if your data access patterns change over time, you’ll have to go through the headache of migrating all your data to entirely new tables to support these new access patterns. Complex queries you might be used to from SQL databases can be tricky or nearly impossible to design properly.
The following links do a good job of explaining these issues. I am well aware that these are just a few people’s experiences with DynamoDB. There are no doubt many people who find the platform incredibly powerful for their use cases. However, I think it’s still useful to consider the points brought up in these articles.
You probably shouldn’t use DynamoDB
Avid readers of the Ravelin syslog will remember a story from last year about our use of DynamoDB. It outlined a few…
11 Things You Wish You Knew Before Starting with DynamoDB — The Distributed SQL Blog
DynamoDB is a fully managed NoSQL database offered by Amazon Web Services. While it works great for smaller scale…
Ultimately, DynamoDB is better-suited for very specific use cases, rather than for general-use applications. That being said, it is an option to consider if your data and access patterns fit well with DynamoDB’s architecture and design.
Cosmos is Microsoft’s cloud-native multi-model database and it’s one of the most ambitious solutions on this list. Because it is a multi-model database, you can interact with it using a variety of APIs including SQL, Cassandra and MongoDB, among other options. There are multiple data models including key-value, column, document and graph. There are multiple consistency levels which allow you to choose the exact trade-off you need between performance and consistency. Unlike other cloud providers who have multiple database platforms to handle different scenarios and use cases, Cosmos is designed be one database that can handle nearly everything.
Cosmos seems to live up to its lofty ambitions, at least from my personal experiences with it. It’s a fantastic option to consider, especially if you’re already using Azure for Azure Functions.
Google Cloud Firestore
Cloud Firestore is Google’s Firebase-inspired NoSQL database. Cloud Firestore sits in an interesting place at Google, serving a double role within GCP (Google Cloud Platform) and Firebase. On the GCP side, Firestore replaces GCP’s older NoSQL solution known as Cloud Datastore. On the Firebase side, Firestore replaces Firebase’s older Realtime Database. The result is a fully-managed serverless database that is especially well-designed for web and mobile apps while being equally well suited for general purpose server-side use.
As expected from a Firebase product, clients can connect directly to Firestore without a backend in the middle, using its built-in security mechanisms to control access to data. Firestore also keeps data in sync across clients in realtime and offers offline support. Beyond mobile SDKs, Firestore offers a full breadth of server SDKs including Node.js, Java, Python and Go.
Firestore is incredibly powerful in offering a lot of useful functionality for web and mobile apps out-of-the-box, without you having to configure your own backend to support such features. However, Firestore should still work quite well for architectures where you want your own backend.
Developed by former technical leaders at Twitter, FaunaDB is the only database on this list that is not part of an existing major cloud provider. FaunaDB describes itself as being a relational NoSQL database. It might sound unusual to hear at first because most NoSQL solutions are non-relational by nature, but dig a little deeper and you’ll see that FaunaDB brilliantly bridges the gap between these two paradigms. It gives you the scalability and flexibility of NoSQL solutions while retaining the power of being able to model and query relational data.
The result is that FaunaDB feels like a database that’s actually designed for modern cloud-based applications, while retaining relational functionality (which is rare for a NoSQL database). It supports multi-tenancy by allowing you to create separate child databases under any database, allowing you to easily create secure isolated environments for each of your tenants. Built-in role-based access control allows you to offload authorization logic to the database, making your serverless backend even leaner. This also allows web and mobile clients to connect directly to the database if you want to avoid a backend entirely.
Similar to Azure Cosmos, there are multiple APIs you can use to access your data, including a Calvin-inspired syntax, GraphQL and soon, SQL. A great online management dashboard and a CLI tool also make for an excellent developer experience.
FaunaDB may not have the prominence associated with the products from larger cloud providers, but it’s an incredibly well-engineered product that seems to address the exact pain points that serverless applications (and by extension, cloud applications in general) face.
This list is by no means exhaustive, but it covers some of the best options I’ve seen for databases that fit well with serverless architectures. If you’re taking the step to build a modern scalable serverless backend, I highly recommend (where possible) doing the same with your database so you can fully take advantage of the power of a serverless architecture.
Find me on Twitter 🚀