Avoiding n+1 requests in GraphQL, including within subscriptions

Published in

Slite

4 min readJul 5, 2017

Note: this article will not make much sense unless you know the basics of GraphQL, an awesome technology we use at Slite. We recommend you get a brief understanding of it before going forward. Here are some resources.

What are n+1 requests?

n+1 requests are the most obvious optimisation issue one might encounter when writing a GraphQL backend. It’s easy to illustrate with a simple relation: let’s say you have a User type which has a relation to an Address type.

type User {
  id: ID
  address: Address
}type Address {
  id: ID
  streetName: String
  city: String
}

Now, let’s assume you have a simple query returning a list of users, a client might do this query:

query GetUsersList {
  userList {
    id
    address {
      id
      streetName
    }
  }
}

That’s it for the context. Now to understand the issue we have to follow the GraphQL flow of resolution:

First step: somehow, the userList query resolver will return a list of users by making, most likely, one single query to the store

const resolvers = {
  query: {
    userList: (root) => {
      return db.users.all()
    }
  }
}

Second step: for every n users of this list, GraphQL will need to resolve the address attribute. This means it will go through the address resolve function n-times, most likely triggering n requests to your datastore.

const resolvers = {
  User: {
    address: (user) => {
      return db.addresses.fromId(user.addressId)
    }
  }
}

This is what the “n+1 requests” expression comes from.

In 99% of cases this is a very suboptimal way to fetch the several elements. It is very likely that your datastore allows you to fetch many items at once with something like: db.addresses.fetchFromIds(addressIds)and this is how we want to do it.

dataloader: the simple way to avoid n+1s in GraphQL

dataloader is a quite simple module, once again contributed to the community by facebook, that does two things:

caching of similar requests
batching of single item request to grouped requests

The first feature is useful if, somehow, you end-up requesting the same entity several times during the same request. That might help in that case but won’t solve our n+1 issue.

The second one will: instead of directly fetching our address from the user.addressId, we’ll queue the request in a dataloader. The dataloader will gather all the addressIds for sometime and at some point (nextTick), will make a single request using the resolve function we’ve set it up with. The dataloader’s documentation gives us a clear idea of how we would do that:

First we’d declare that a new address dataloader should be created for each request and attach it to the context.

function createRequestContext() {
  return {
    dataloaders: {
      addresses: new Dataloaders(function fetchAddresses(addressIds)  {
        return db.address.fetchFromIds(addressIds)
      })
    }
  }
}

Then change our User.address resolver to something like:

const resolvers = {
  User: {
    address: (user, context) => {
      return context.dataloaders.addresses.load(user.addressId)
    }
  }
}

That’s it. Now the User.address resolve might be called many time but the passed address IDs will be gathered by the dataloader and the db.addresses.fetchFromIds function will be called once.

Simple enough, isn’t it?

the case of GraphQL subscriptions

With the current graphql-subscriptions implementation, you have only one chance to setup a subscription context with the onOperation callback. Which means you can only create the dataloaders you need once, when the subscription is made. Because dataloader acts as a cache, it won’t even try to refetch data it already fetched in a previous subscription query execution.

To illustrate the issue: let’s assume you’ve subscribed in such way:

subscribe UserUpdates {
  userUpdated {
    id
    address {
      id
      streetName
    }
  }
}

You update the user a first time, the address is resolved and cached. Somehow the address is updated. Now you update the user another time: as it was cached, the initial version of the address a would be returned. Shame 🔔.

a twisty but deceptively simple solution

Ideally, we’d love to be able to create a new dataloader everytime a query is made but that’s not possible yet.

We could also disable the caching, which dataloader let’s you do super easily:

new Dataloader(resolveBatchFunction, { cache: false })

The problem is that it also disable de-duping of similar ids.

We’ve found a solution within the dataloader documentation but it wasn’t exposed as a solution to the subscriptions case. The solution is to manually clear the cache everytime a batch is resolved using the dataloader.clearAll() function.

With the help of a simple helper, you can set that up quite easily:

function makeSelfClearingDataloader(resolveBatchFunction) {
  const dataloader = new DataLoader((ids) => {
    dataloader.clearAll()
    return resolveBatchFunction(ids)
  })
  return dataloader
}

Now we update our context creating function:

function createRequestContext() {
  return {
    dataloaders: {
      addresses: makeSelfClearingDataloader((addressIds) => {
        return db.addresses.fetchManyFromIds(addressIds)
      })
    }
  }
}

Done!

Now the cache we’ll be cleared before the batch is resolved so further calls to load will be resolved on their own instead of using the result of the batch that is about to be resolved.

it means nothing without a few metrics

Unmeasured optimisation have no sense, we should never rely on guts feelings. So here is a single measure that’ll easily prove the value of dataloader:

In our API, we tested the impact on a query returning roughly 400 items (so 400+1 requests). Without dataloader: 20 seconds. With dataloader: 200ms. While 200ms isn’t amazing, it’s not that bad for a 400 items list but the ratio is what matters: 100x improvement is what we are looking at. Proof is given.