This is the second part of an article about Caliban, a library for writing GraphQL backends in Scala in a typesafe, boilerplate-free and purely functional manner. If you haven’t read already, please check Part 1 to understand what this is all about.
In the first part of this series, we’ve seen how to create a simple GraphQL API. However, we didn’t take full advantage of GraphQL capabilities: our schema was so basic that we didn’t need to worry about any kind of optimization.
It is very common to have deeply nested schemas where each inner field might require gathering data from a database. Ideally, we’d like to keep these calls to a minimum. Let’s see how we can deal with this using Caliban.
We are going to use a different API for this post. Let’s say we want a query that returns a list of
Orders, and for each order, we want to expose information about the
Customer who placed it, the
Products the order is made of, and the
Brand of each product. Here’s our simple data model:
We also assume that we have a
DBService that we can use to query all the data we need from a database. To compare the different approaches we will try in this post, I implemented a simple
DBService that returns some fixed data and records how many DB hits are performed (the whole project is available on github).
The query we are going to use to measure the efficiency of our backend is the following:
We want to return a list of the last 20 orders, with customer and product information but without the brand information.
As we saw in the first part of this series, Caliban requires we write case classes to define the GraphQL schema we want to expose. Here, our root query type will be a case class named
Query with a single parameter called
What should be the type of this
- It requires an argument named
count, so we will create a case class
QueryArgs(count: Int)and make
ordersa function from
- To resolve
orders, we will need to make a DB call, so the return type should be wrapped in
IO. To make it simple, let’s return a
IOthat cannot fail).
- We can’t return a list of the
Orderdata type, because it only contains IDs of customers and products. We need to denormalize the data by creating an
OrderViewdata type that will contain everything that is queryable.
In conclusion, our
orders parameter will have the following shape:
QueryArgs => UIO[List[OrderView]].
Now what is the problem with this implementation? It doesn’t actually consider the fields requested by the client besides the first one. To implement the resolver for
orders, we first need to get the list of orders and then for each order, we need to get the products, the customer and the brand data in order to build the
OrderView object, even if those were not required by the client.
If we run our test query with the dummy
DBService, it causes 101 DB Hits:
- 1 hit for getting the list of orders
- 20 hits for getting the customer of each order (there are 20 orders)
- 40 hits for getting the product data (each order has 2 products)
- 40 hits for getting the brand data of each product
This is really inefficient. Our query doesn’t even request brand data but because
orders returns everything available, we spend a lot of DB hits for nothing. Let’s do better.
So far we’ve only used effects in the root case classes, but nothing prevents us from using effects in the nested case classes too. Returning an effect basically makes a field lazy: the effect will only be run when needed. As a general rule, we can say that any field at any level that has some cost (e.g. causing a DB query) should be turned into an effect.
Let’s revisit our case classes to apply this simple rule. We will transform
ProductDetailsView#brand to effects because those fields require extra DB calls to get customer, product and brand data.
With this simple change in place, running the test query on our dummy test
DBService now results in 61 DB Hits. Why? We got rid of those useless 40 calls for getting brand data which the query didn’t require. If the query didn’t include customer or product data, the gain would be even higher.
We are now only gathering the data we really need. But what if the same customer had several orders, or what if the same product was ordered several times? We’re potentially querying the database multiple times for the same thing.
Caliban comes with a data type called
ZQuery that addresses this particular problem. A
ZQuery[R, E, A] is a purely functional description of an effectual query that may contain requests to one or more data sources. The type parameters are very similar to
ZIO: it requires an environment
R, may fail with an
E or succeed with an
What makes it interesting for our use case?
- Requests are parallelized:
ZQuerycollects requests that don’t depend on each other to run them in parallel.
- Requests are deduplicated and results are automatically cached: identical requests are run only once within the same
- If a batching function is provided for a given data source, multiple items can be queried at once.
That means that if we transform our effects from simple
ZQuery, we will benefit automatically from parallelization and caching. We’ll try the batching a bit later.
ZQuery, we need to define 2 things: a
Request type and a
Request[E, A]is a simple data type that represents a request from a data source for a value of type
Athat may fail with an
E. We need an actual value for each request so that we can compare them and cache them: if 2
Requestobjects are equal, they will be considered the same and executed only once.
- To create a
DataSourcewe will use
DataSource.fromFunctionMthat simply takes a function from our
Requesttype to an effect returning the expected result type. A
DataSourcealso needs a unique name.
We then call
ZQuery.fromRequest with a
DataSource and we get a
ZQuery back. The following snippet shows our new API definition with
ZIO and how we create the
ZQuery for getting
Customer data (the same thing should be done for
Let’s now run our query: 9 DB Hits! We had a lot of redundant calls, because the same customers and products were referenced in multiple orders. Now each individual customer and products is read only once from the database. That is quite an improvement, but can we do even better?
ZQuery with batching
I mentioned earlier that given a batching function,
ZQuery was able to group requests to a same data source and query items all at once. Let’s try to do that.
Our schema case classes are going to be exactly the same, the only difference is how we create our
DataSource. Instead of using
fromFunctionM, we will use
fromFunctionBatchedM, which takes an
Iterable[Request] (instead of a single
Request) and must return an effect with a
Iterable of our result type. That result list should have the same length of the input requests and preserve the order. We will then call another function of our
DBService that returns data for a list of IDs. As an example, here’s the new
How’s the result now for our test query? Only 3 DB Hits! 1 for getting our orders, 1 for getting all needed customers and 1 for getting all needed products. If our database supported it, we could even go down to 2 DB Hits by using a common
DataSource for customers and products (our
Request would then have to be a sealed trait with 2 possible case classes) and query them together.
ZQuerydata type doesn’t actually depend on the rest of Caliban and could totally be used without GraphQL. We expect to extract it into its own library at some point in the future.
We’ve seen different approaches to optimize our GraphQL backend and reduce unnecessary calls to our database. Using
ZQuery is not always possible or even needed depending on the use case, but it’s usually a good choice when your schema has a lot of nesting.
As mentioned earlier, you can find the 4 different implementations in this repository. If you’d like to know more about
ZQuery, you can have a look at the dedicated page on the Caliban website or come discuss with us on the Discord channel.
In the last part of this series, we will explore wrappers, a new feature of Caliban that makes it possible to implement a wide range of custom behaviors during query or field processing.