Scaling RDBMS for GraphQL backends on serverless
Serverless provides a managed runtime for applications which can be triggered on events, for instance, an HTTP request. The benefits of serverless are beyond the scope of this post but we can summarise it as a simple way to deploy a web application.
Now, let’s consider a GraphQL backend which you may want to deploy on a serverless platform. Simple as it may sound, there is a fundamental problem as to why this is not easy. And that is because serverless backends don’t handle database connections very well.
The problem with RDBMS and serverless
Serverless backends are launched per request and do not share any state between different requests. This means that if our application depends on some shared state, like a connection pool with a database, we need to recreate that state everytime the application is invoked. In the case of GraphQL, we need to create database connections on each request (assuming your GraphQL schema talks to a database). Creating and destroying database connections are very costly performance and resource consumption wise. A typical database can only hold few hundred connections at max. Hence, connecting directly to a database on serverless will start returning 503s on a high volume of requests which is where serverless is supposed to shine.
Connection Pooling
So how do we solve this problem? Fundamentally, we need something outside of serverless infrastructure managing the database connections for us. What we need is a standalone connection pooler for our database. For instance for Postgres or MySQL, we can deploy something like pgBouncer or proxySQL respectively. On a free tier EC2 instance, pgBouncer can handle more than 10k connections at a time! So instead of connecting to the database directly, you should connect to the connection pooler:
With pgBouncer as the connection pooler for our serverless GraphQL backend, we measured the error rate vs. the invocation rate of incoming requests on AWS Lambda and a free tier RDS instance. Note that X req/s is the invocation rate and not the throughput.
The results are phenomenal:
Getting Started
In case you want to get started quickly with a GraphQL backend on serverless, you can check out this repository which has many boilerplates with local dev and deployment instructions: https://github.com/hasura/graphql-engine/tree/master/community/boilerplates/remote-schemas . Contributions are welcome for other languages, frameworks, and platforms.
Using with Hasura GraphQL Engine
Hasura GraphQL Engine gives you a wide range of GraphQL APIs on Postgres instantly. You can merge your serverless GraphQL service with Hasura to extend Hasura with your own schema with any custom logic. You can do this very easily by adding your GraphQL endpoint as a Remote Schema. Follow the guide here for details.
Conclusion
With the kind of result we see by adding a connection pooler, we see that it is quite feasible to host GraphQL backends on serverless which talk to RDBMSs like Postgres. Some of you may argue that by adding a standalone long-running instance to a serverless infrastructure, we have made the entire infrastructure non-serverless. This is mostly false, serverless adoption is a spectrum. If addition of a small “server-full” component makes the rest of your infrastructure leverage serverless to the maximum, then it is well worth the cost.