GraphQL query timeout and complexity management

An overview of solutions to prevent expensive GraphQL queries from impacting your API performance.

Published in

WorkflowGen

5 min readMar 23, 2017

With a single call to a GraphQL API, a developer can retrieve an advanced set of results that would otherwise have required a dozen separate HTTP queries with REST or similar web API technologies.

GraphQL improves the developer experience and client-side performance by reducing HTTP round trips and associated processing, but it also adds pressure to the backend that has to handle complex queries.

On the other hand, with REST, you know the cost of each API endpoint according to the parameters sent by the client.

A single GraphQL query can potentially generate thousands of database operations.

Let’s look at some possible solutions to make our GraphQL API scalable and prevent expensive queries that slow down the backend.

Persisted queries

In the database world, we have SQL queries and stored procedures. We can have the same concepts in GraphQL: GraphQL queries and persisted queries. The benefits of static and persisted queries are well explained in this article by Sashko Stubailo.

This solution is well-suited when you manage both backend and frontend development.

But persisted query usage also creates a tight coupling between the apps and the GraphQL server that can cause friction in release management, since a new application version has to wait for the backend to support new or updated persisted queries.

Our WorkflowGen GraphQL API doesn’t yet support persisted queries. Our clients are free to create queries according to their requirements and usage without having to declare those queries on the GraphQL server. In a future release, however, we can offer an option to manage persisted queries for developers who prefer a deterministic approach to prevent “unknown” GraphQL queries from being executed on the server.

Request size limit and request timeout

You can easily limit HTTP request size to prevent huge GraphQL queries from being completed. This configuration provides a first level of protection, but you can still have a GraphQL query that’s costly in terms of performance even while defined in an acceptable query string size.

The HTTP request timeout is another setting you have to adjust to prevent long running queries. The frontend will get a timeout error, but according to your backend technologies, the API server can halt the ongoing HTTP request without stopping the execution of GraphQL query resolver functions.

In short, request size limit and request timeout settings are necessary, but aren’t enough to provide protection against expensive queries.

Query complexity analysis

The GraphQL server can analyze the query Abstract Syntax Tree (AST) to determine the level of complexity; you can calculate the tree depth, for example.

But one of the main GraphQL advantages is to allow this kind of “deep” query! It’s like limiting the number of joins in a SQL query.

Furthermore, a complex query doesn’t always generate a high number of backend operations; it depends on how resolver functions and caching are implemented.

Cost analysis

The cost analysis-based solution is very promising, since you can define a “cost” per field and then analyze the AST to estimate the total cost of the GraphQL query.

When the query returns a variable number of items in a collection, you can make your estimate by using the maximum possible items in a page list.

The cost unit definition can be based on the number of generated SQL queries or subsequent REST API calls or the resolver function execution average duration.

Implementing an efficient cost analysis solution should require significant design and development efforts, but hopefully new tools and solutions will emerge from the GraphQL ecosystem to accelerate its implementation.

Resolver function timeout

A GraphQL server can be seen as an API proxy: the query is a tree of API functions (resolvers) executed by the GraphQL server.

A resolver execution duration is critical for the whole GraphQL query performance.

It’s also at the resolver level that you can add protection against expensive queries, because the business logic that often involves databases, datastores and remote procedure calls is executed from here.

A pragmatic approach is to set a short timeout value for the operation called by the resolver function. When a timeout occurs, the backend operation is canceled, and the resolver function throws an exception; a null value is set for the corresponding field, and an error is added to the GraphQL query result. The frontend developer can then decide to use the result partially or to reject the query result if timeout errors are found.

Resolver function operations count

By waiting for a more advanced cost analysis solution, you can count the number of resolver calls. If your resolver calls multiple backend operations, you can increment this counter as well.

Request: {
    requester(request, args, context) {
      context.incrementResolverCount();
      return context.loaders.users.load(request.requesterId);
    },

When the resolver count reaches the maximum number you defined, you can stop processing other resolver related operations. The GraphQL query will be completed, but some fields will have null values, and errors will be added to the GraphQL response.

In this case as well, it will be up to the developer to choose what to do with the partial result.

The operation counter solution is not as precise as a cost analysis-based one, but it provides a first level of protection for your backend.

GraphQL query timeout

There is no timeout option in the current graphql-js and express-graphql libraries we use to power our GraphQL Node.js based server.

To support a query timeout, we check the GraphQL query duration in the function where we increment the resolver operation count.

request.incrementResolverCount =  function () {
    var runTime = Date.now() - startTime;
    if (runTime > config.graphql.queryTimeout) {
      if (request.logTimeoutError) {
        logger('ERROR', 'Request ' + request.uuid + ' query execution timeout');
      }
      request.logTimeoutError = false;
      throw 'Query execution has timeout. Field resolution aborted';
    }
    this.resolverCount++;
  };

If a timeout occurs, the following resolver operations are not launched; the query is completed, but the impacted fields have their values set to null, and GraphQL query errors are added to mention the timeouts.

Conclusion

It would be great to have a built-in feature in the GraphQL libraries to manage the query timeout that cancels ongoing resolver operations and prevents remaining ones from being executed. It would reduce the GraphQL API Server “infrastructure” related code.

In the meantime, our next WorkflowGen GraphQL API release will support the resolver operation timeout, the resolver operation count, and the query timeout. When combined, these settings will allow fine tuning of backend protection against expensive queries.

The inherent backend complexity and performance management of some GraphQL queries should be transparent for frontend developers to make API usage as smooth as possible with the highest level of flexibility.

The persisted query-based solution is very efficient but cannot be applied to all usage scenarios in our case.

In addition to protection against expensive queries, caching (with dataloader libraries, for example) and load balancing make the GraphQL API server more scalable.

Finally, emerging GraphQL API monitoring tools will help to identify bottlenecks and optimize resolver operations according to actual API usage.