DataLoader in GraphQL

Published in

HMIF ITB Tech

6 min readSep 1, 2019

If you’ve read my previous article, I mentioned something about GraphQL solving N + 1 fetch problem. This is true for the client-side. However, it only pushes the problem to the server-side. On the server-side, a service still has to communicate with a database or other services to serve a request. And depending on how it resolves the query, it might suffer from the same problem.

N+1 Problem

Let’s revisit the N + 1 fetch problem, but now in more details. N + 1 fetch problem is when you have a collection of an object, say an Authors collection. And for each object, it has another collection, say an Articles collection. When you want to list articles made by each author, you can naively fetch all registered authors with one request, then fetch the articles for each author. But this creates many round-trips to the backend or database. If you get N author from the first fetch, it means you have to make N request to the backend or database.

There are two trivial ways to solve this problem. First is using eager-load when fetching the data. Second is letting ORM does the batching. The first option is not ideal since there might be someone who does not need the Article collection details. The second option cannot be trivially applied in GraphQL. This is because each resolver function only knows about its own parent object. What this means is that the resolver for an article does not know that the article that it’s resolving belongs in a list which can be batched.

The Solution

There exists a tool called DataLoader that is made to solve this problem. DataLoader is a generic utility to be used as part of your application’s data fetching layer to provide a simplified and consistent API over various remote data sources such as databases or web services via batching and caching. Its main goal is to reduce requests when an application needs to load data from backends. DataLoader provides a clean interface to load individual values while coalescing all individual loads. When the time is right, DataLoader will dispatch all load calls in one request.

DataLoader also has the advantage of batching requests across different resolvers. For example, you want to load new and popular articles from a few categories. In ORM, you might call:

newArticles = ORM.getMainCategories().getNewArticles();
popularArticles = ORM.getMainCategories().getPopularArticles();

The ORM might process those queries as separate queries and generate four queries to the database. But with DataLoader, those queries can be batched.

The batching also includes removing duplicate queries. When two queries have an intersection of elements, only one for each intersected elements exists in the batch. For example, when the first query wants to load [0, 1, 2] and the second query wants to load [1, 2, 3], the batch sent to the backend will be [0, 1, 2, 3].

DataLoader in Action

Now we’ll see an example of DataFetching with and without DataLoader. Let’s set up the query example first. Say you want to create a page that shows one featured article and a few most popular articles from each category. Here’s a possible GraphQL Query to do that:

{
  getGenres {
    articles {
      title
      thumbnail
    }
  getFeaturedArticle {
    title
    thumbnail
  }
}

Next, we’ll see how the request probably looks like with and without DataLoader. But for now, let’s assume that the backend service only gives an ID when we call getGenres and getFeaturedArticle. In the real world, the service most likely will include the details, but let’s just say the service doesn’t give that we don’t want to fetch.

DataFetching Without DataLoader

First, QueryResolver will fill these fields by asking a database about all genres on this site and today’s featured article. Two requests will be sent to the backend.

Step 1. Fetch all genres and the featured article

Assume we get two genre IDs. For each genre ID, we want to know the details and all the articles that belong to that genre. Also, the details about the featured article. This is three queries.

Step 2. Fetch the details for each genre and article

Lastly, for each article we get for each genre, we request the details for those articles. If we get four articles, we will send request four times.

Step 3. Fetch the details for each article

In totals, there are 9 queries to the backend to resolve the query.

DataFetching With DataLoader

First, it’s the same with DataFetching without DataLoader. We cannot batch anything in this step. Two queries in this step.

Next, we can batch the fetch of genre details. But we defer the fetch of the details of the featured article. This is done automatically by DataLoader. Because of the batching, we only query once in this step.

Step 2. Batch all genres and send one batched request

Lastly, we batch the fetch of article details. The featured article included in this batch too. But, since in this example, the featured article also belongs to Genre 1, its ID only included once in the batch. Also just one request to the backend.

Step 3. Batch all articles and send one batched request

In totals, only 4 queries needed to resolve the query.

Disclaimer

When we want to use batching in our application, the backend needs to support batched request. From the previous example, we see that the batch is sent to the backend with findAllById. It supports receiving a list of IDs and returning a list of resources. In SQL, it means using WHERE id IN keys with keys is the list of IDs we request. From Web Services perspective, it means having a GET endpoint that has a parameter a list of IDs. If the backend does not support batched request, we cannot be fully efficient.

Caching

Even though you don’t have a batched backing service, you can still get a benefit from DataLoader. By default, DataLoader does cache their own requests. If DataLoader has seen a data item before, it will have cached the value and will return it without having to ask for it again. The default TTL (Time to Live) of the cache is forever. The cache lives in DataLoader. So, if you want to clear the cache each request, you can simply create a new DataLoader for each request. DataLoader also has an option to attach your own caching mechanism, e.g. memcached or redis.

Conclusion

DataLoader is a tool that fills some GraphQL limitations. DataLoader main features include batching and caching. You can get a significant efficiency gain with DataLoader when you have many nested queries of collections and duplicate entries.