Using Relational Databases With Serverless Functions

Originally published at www.stackery.io.


I love relational databases. Gimme some of that SQL love.

Why?

Well, the fact is they do a ton more than most non-relational databases. I don’t mean to sound silly, but the fad of using NoSQL databases for every use case is getting really old. Relational databases could use a hug, because no one seems to love them anymore, and yet they can do so much for us.

But explaining the power of relational databases is for another blog post. This post is about using SQL databases in serverless functions.

Why?

It turns out that SQL databases really want to love you back long term. They work best in a committed relationship in the form of a long-lived network connection. Unfortunately, this can cause scalability and performance problems when combined with serverless functions.

Connection Limits

SQL databases have a limit to how many connections can be open at a time. For example, an AWS db.t2.micro MySQL database instance is limited to 65 concurrent connections by default. While this limit can be configured, at a certain point you will start to use up too many resources either in CPU time spent or memory consumed per connection.

This isn’t a problem if you have a pool of servers accessing the database. They generally keep one connection open per thread and reuse the connection for many requests. In the worst case if you have N threads you will have N open connections. It’s a metric that is easily tracked and managed.

However, in the serverless world you can have pools of previously used functions lying around while new functions spin up to handle requests. A powerful aspect of serverless functions is that they can maintain the state of a network connection across multiple invocations of the same function instance. But as soon as you go over 65 warm function instances you will max out your connections to your db.t2.micro instance, even if at any given time there are only a few active function invocations.

Fortunately there are things we can do about this problem. Unfortunately, because they go against traditional “serverful” conventions, these solutions aren’t always easy to implement.

Connection Pooling

First, you may be using a connection library that supports connection pooling. Connection pooling allows a set of connections to be used and reused across multiple threads of execution. Since we only want one connection open during an invocation of a function, this feature is unhelpful. Turn pooling off, or at least decrease the maximum number of connections to one connection at a time.

Second, if your library can only operate in a pooling mode, like the knex.js library we use at Stackery, you will need to find a way to terminate database connections at the end of every function invocation. This is easier said than done. In general, you have to ensure that whenever your function can be reused (i.e. it didn’t throw an uncaught exception) it must destroy the connection before finishing. This can lead to code that looks like:

module.exports = function handler(message, output) { return db.select('*').from('users') .then((data) => { console.dir(data) // Everything was fine, we're done, so destroy the connection return db.destroy() }) .catch((err) => { console.error(`Something bad happened: ${err.stack}`) // We've handled the error, make sure we destroy the connection still return db.destroy() }) }

(The above would be an excellent use case for native Javascript Promise .finally() support, but c’est la vie.)

Having to manually manage database connections isn’t ideal, but with these two tricks you can enjoy two of the finest pleasures in the world: serverless functions and relational databases!

P.S.: A Word About Connection Latency

The main benefit of connection pooling is that you don’t have to wait for a connection to be set up every time you need one. Tearing down a database connection after every function invocation breaks this optimization. That said, if your database is close by on the same network as your serverless resources, then connection setup latency is not a big deal for most use cases. Serverless functions are not ideal for low latency requirements in general, so if you are ok with the latency of your lambda functions with connection pooling you should be just fine without connection pooling.