Performance problems are often one of the reasons why developers avoid adopting GraphQL API. First Google search results that will appear when searching for “GraphQL performance” will mostly cover the disadvantages of this technology. Are the GraphQL API’s less performant than their RESTful counterparts? Is performance really a big problem of the GraphQL APIs or it is more a myth build by early adopters?
In the following blog post, I’m going to cover some performance facts and myths around GraphQL along with solutions that developers can employ for both GraphQL client and server.
NOTE: in this blog post we are going to focus on the Node.js ecosystem. Described problems can be mapped to any language and reference implementation.
GraphQL performance defined
To properly talk about performance we will need to split typical GraphQL application to following layers of the server-side application:
- Application general setup
Memory and CPU limits. File descriptor limits available on the system etc.
- Query Framework overhead
How much API layer affects the speed of the queries. Can be compared against baseline like plain express.js
User code to fetch data that follows GraphQL resolver specification
- DataSource Layer
Source of our data (API, Database etc.)
GraphQL Query Framework overhead
The performance overhead of the API layer will mostly depend on what framework or middleware we will use. In this chapter, we will cover the most popular GraphQL frameworks and compare them with baseline which will be Express.js API accepting the same payload. Please note that response time is mostly tied to the many different factors (even outside GraphQL layer). Despite that, for complex GraphQL queries, we can notice significant overhead that increases depending on query size and a number of concurrent requests.
To properly understand why this is happening we need to dissect what GraphQL implementation is actually doing. Every GraphQL query goes through three phases: parsing, validation and execution
- Validate — The AST is validated against the schema. Checks for correct query syntax and if the fields exist.
- Execute — The runtime walks through the AST, starting from the root of the tree, invokes resolvers, collects up results, and emits JSON.
All of these phases introduce additional cost to query execution on top of API framework that is being used (Express, Koa, Hapi etc.)
When performing query/mutation on large graphql documents with no nested queries we can measure significant performance degradation on top of the RESTfull layer.
Why this is happening?
Parsing GraphQL queries and computing an execution plan for the query is both memory and CPU intensive tasks that will change depending on the query size.
Reducing overhead of GraphQL layer
GraphQL queries are usually delivered over HTTP protocol. The difference between REST queries and GraphQL queries is that GraphQL payload will contain both data + metadata required to make the query. Metadata for queries tend to be larger than the data query. Server needs to understand provided metadata (called document) by parsing it. Parsing metadata into GraphQL AST can be expensive but also can be avoided in cases where developers care about performance and use limited set of graphql queries in their applications.
There are numerous packages that offer document cache on top of the GraphQL-express/GraphQL-JS. For example, GraphQL-JIT will allow precompiling graphql documents reducing amount of time required to process then when the query is made:
Jit offers a low-level alternative for graphql-js with precompiled queries and mutations. Using GraphQL-JIT reduces the performance overhead of the GraphQL API, however it comes with additional overhead for developers.
For simple use cases, developers can rely on the out of the box cache layer for GraphQL documents. Fastify GQL is one of those packages that utilize cache to store all documents that are passed by clients.
Fastify barebone GraphQL adapter. Features: Caching of query parsing and validation. Automatic loader integration to…
Fastify GQL plugin provides document cache on top of the fastify framework (alternative to express)
Precompiling documents to the cache is not a silver bullet and cannot be used in all the cases. Caching or saving queries will be possible for a small number of the queries executed by clients. In cases where applications using various GraphQL queries (Public API) performance overhead needs to be accepted.
Over-fetching in resolver layer?
GraphQL allows developers to decide what fields they want on both server and client. In most cases, developers focus on the client-side, while ignoring the server-side over-fetching.
GraphQL resolves over-fetching problem on the client but resolvers still querying all the data on server
When building public GraphQL API external developers can often misuse the queries. For example,
getProfile a query that does an expensive fetch from different data sources can be used only to fetch
username for the home page. To prevent over-fetching we can extract information about required fields from the resolver info object and avoid expensive queries.
Parsing info object can be really challenging for the developers — that is why community came with multiple libraries that will simplify this process:
GraphQL Query Mapper will help you to build GraphQL API without overfetching data on the server. GraphQL Query Mapper…
Libraries like GraphQL-Query-Mapper will prevent from problem of server-side database over-fetching by providing a list of the fields that were requested in the client-side query. Developers can use them to perform targeted queries against their database and REST endpoints.
Myth — Every GraphQL server requires a data loader!
DataLoading problem is not strictly related to GraphQL. In fact, every RESTfull API that utilizes ORM will have similar problems. That is why many developers prefer to write complex data queries (yes yes.. involving SQL Joins or Map Reduce) for REST over the ORM layer. ORM layers usually come with the cache layer that can be configured to resolve this problem.
GraphQL implementation will build a query execution plan that will trigger different resolvers that individually can perform different data queries.
Multiple data queries for the same resource can be avoided that is why data loader library was built:
DataLoader is a generic utility to be used as part of your application's data fetching layer to provide a consistent…
Developers are not forced to use the GraphQL execution plan (and DataLoader). To stimulate REST-like experience developers can rely on top-level resolver for individual queries/mutations (without calling nested fields or relationship resolvers). Returning all required data in the first resolver by using SQL Joins or ORM like solutions could be a good alternative to DataLoader library.
Top-Level Resolvers can fetch all data required from relationships and deliver it much faster than in a classical execution plan that needs to traverse through the entire graph. This approach can be used in conjunction with the parsing info objects in resolvers. Top-level resolver fetching approach will not require Data Loader batching layer since all queries and data will be controlled from the root — however, this leaves the responsibility of fetching all relationships to the developer who also needs to make sure that returned object contains all data that user requested.
When building the GraphQL enabled solutions performance overhead cannot be ignored. While applications that use GraphQL can be slightly slower initially, it is possible to tweak them to archive better results.
If you looking to get started with GraphQL I would personally recommend trying GraphQL-CLI that helps developers to generate GraphQL enabled, production-ready server applications.