Journey to a Federated GraphQL — The Cost of Queries

Erik Schults
Pipedrive R&D Blog
Published in
4 min readFeb 17, 2021
Photo taken from https://codeburst.io/

After spending significant time building a federated GraphQL stack, and seeing it start to take shape, we realized that we were now vulnerable to malicious queries due to a lack of limits to the kind of queries that could be made. We tried to find an existing solution to this problem from open source libraries, but unfortunately there weren’t any that suited our needs.

Fortunately, there was a library, graphql-cost-analysis, which gave us the inspiration for our own implementation. What we really liked about the approach was that the costs of query-able fields could be defined directly in the schema and the query could then be analyzed statically before execution.

Types are cool

Being inspired at the time, it was time to write the code. Due to deadlines, we chose to use pure JS instead of TypeScript. It soon turned out that working with recursive GraphQL AST was not that fun since there are several different interfaces inside the recursive tree and knowing them all is a bit overwhelming. With TypeScript it would’ve been much easier since it knows the types for you. Also, when looking back at the code now, it’s a little hard to understand which object is currently being processed.

Time to define the costs

Once the code was functional, it was time to define the costs in our schemas using the cost directive.

But…

Custom directives are not supported

After defining a couple of costs in the schema, we tried to integrate them with our GraphQL gateway. Unfortunately, Apollos federation, which we use to combine the schemas, did not support custom directives. Luckily that wasn’t a show-stopper in our case. We decided to extract the directive values from the schema with the directive itself and store the clean schema and cost mappings separately in the Schema Registry. GraphQL Gateway now started combining federated schema and cost mappings separately. The extraction also made the cost calculation simpler — no need to parse directives from the schema while analyzing the query.

What does it really take to query a field?

As servers do many things, (querying databases, making network requests, etc) the complexity argument felt too vague. As such, we came up with the concept of tokens to describe more precisely what querying a field means or how many resources it requires. We added two token types: “db” & “network”, where one token has the cost of 100. Tokens act as a hard cost for the field. Complexities defined together with multipliers are multiplied, whereas tokens just add cost for the field. Tokens are only multiplied by their parent multipliers — if one would query a list of 100 items where each list item has a field User with defined network token, it would mean 100 network requests for fetching all the users, therefor 100(list) * 100(token) cost.

Here’s an example of token usage (for more details please check out the readme of graphql-query-cost).

Recursions from hell

We thought that we had solved the recursive query problem because querying lists should require the use of multiplier/limit arguments, these would eventually pile up and hit the cost limit. Sadly, small lists and single entities don’t need limits, so at some point we became vulnerable to malicious queries like:

The issue was luckily quite straightforward to solve, we just count how many times recursion happens and apply an exponential cost for each level of recursion.

Time to play

We wanted to somehow indicate the costs of fields in our GraphQL playground. For the playground we are using grahpiql, which shares a similar issue with Apollos Federation — custom directives are supported, but not displayed. It looked like quite a hassle to have the schema inspector of the playground display our cost directive so we dropped the idea quickly. What we did instead, was something I think is even cooler. We decided to add a cost counter to the footer of the playground. This way our developers can really enjoy the playground and also see the cost of a query even before executing it:

Honorable mentions

  • Artjom — Thank you for the opportunity to work on cool stuff :) Also, he pushed the idea of tokens.
  • Aleksander — For heated arguments :)
  • Aleksander — Was nice :)

Interested in working in Pipedrive?

We’re currently hiring for several different positions in several different countries/cities.

Take a look and see if something suits you

Positions include:

— Software Engineer in DevOps
— Site Reliability Engineer
— Junior Infrastructure Engineer
— Database Engineer
— And several more

--

--