Scaling RDBMS for GraphQL backends on serverless
Serverless provides a managed runtime for applications which can be triggered on events, for instance, an HTTP request. The benefits of serverless are beyond the scope of this post but we can summarise it as a simple way to deploy a web application.
Now, let’s consider a GraphQL backend which you may want to deploy on a serverless platform. Simple as it may sound, there is a fundamental problem as to why this is not easy. And that is because serverless backends don’t handle database connections very well.
The problem with RDBMS and serverless
Serverless backends are launched per request and do not share any state between different requests. This means that if our application depends on some shared state, like a connection pool with a database, we need to recreate that state everytime the application is invoked. In the case of GraphQL, we need to create database connections on each request (assuming your GraphQL schema talks to a database). Creating and destroying database connections are very costly performance and resource consumption wise. A typical database can only hold few hundred connections at max. Hence, connecting directly to a database on serverless will start returning 503s on a high volume of requests which is where serverless is supposed to shine.
So how do we solve this problem? Fundamentally, we need something outside of serverless infrastructure managing the database connections for us. What we need is a standalone connection pooler for our database. For instance for Postgres or MySQL, we can deploy something like pgBouncer or proxySQL respectively. On a free tier EC2 instance, pgBouncer can handle more than 10k connections at a time! So instead of connecting to the database directly, you should connect to the connection pooler:
With pgBouncer as the connection pooler for our serverless GraphQL backend, we measured the error rate vs. the invocation rate of incoming requests on AWS Lambda and a free tier RDS instance. Note that X req/s is the invocation rate and not the throughput.
The results are phenomenal:
In case you want to get started quickly with a GraphQL backend on serverless, you can check out this repository which has many boilerplates with local dev and deployment instructions: https://github.com/hasura/graphql-engine/tree/master/community/boilerplates/graphql-servers . Contributions are welcome for other languages, frameworks, and platforms.
Using with Hasura GraphQL Engine
Hasura GraphQL Engine gives you a wide range of GraphQL APIs on Postgres instantly. You can merge your serverless GraphQL schema with Hasura so that you only write custom business logic and use Hasura for standard CRUD APIs. You can do this very easily by adding your GraphQL endpoint as a Hasura Remote Schema. Follow the guide here for details.
With the kind of result we see by adding a connection pooler, we see that it is indeed feasible to host GraphQL backends which talk to RDBMSs like Postgres on serverless. Some of you may think that we have added a standalone long-running instance to a serverless infrastructure. Yes, we have, but at no cost at all. Even if we replace the EC2 instance with, say, a managed container instance, we get all the benefits of serverless except for per-request pricing. A small price for making the rest of your application leverage serverless to the full.