Querying a database with GraphQL and DataLoader, an introduction in Go.
In this article we are going to cover building a simple GraphQL API to retrieve data from an online store database. The focus will be on the GraphQL side of things, but the source code of a fully-working example can be found here.
If you are completely new to GraphQL, there is a great introduction to it at https://graphql.org/learn/. For the purposes of this walkthrough, you can think of it as an alternative to REST that provides a flexible and type-safe way to query APIs for data.
The Database
You can use a database of your choice for this tutorial. In the linked Github repository, a Dockerized Postgres database is used as an easy and disposable solution. The database access code is pretty standard and straightforward, so we will skip listing it here. But here is the Dockerfile for our database:
FROM postgresENV POSTGRES_USER store
ENV POSTGRES_PASSWORD p
ENV POSTGRES_DB storeCOPY init.sql /docker-entrypoint-initdb.d
Here, we let Postgres know what the user name, password, and database name will be. Additionally, we tell it to run the SQL from init.sql
upon start up. That will create a simple database of an online store. Fig. 1 shows its schema.
To have our database running, we first need to build a Docker image from the Dockerfile:
docker build -t store .
And then run it:
docker run --rm -p 5432:5432 store
The GraphQL library
There are many different GraphQL libraries for Go. Here, we are going to use one by graph-gophers.
GraphQL
In GraphQL, resolvers are responsible for returning, or resolving, object data. For example, an order resolver would resolve order properties like order ID or time of order, by returning their corresponding simple-type values, an integer and a time value. It would also resolve customer that made the order, which could be represented as a complex type that encapsulates customer’s ID, name, address, etc. In this case, the returned value would be another resolver, customer resolver. The list of products in the order could be resolved by yet another resolver. Nesting like that, results in a directed acyclic graph of resolvers, with each of them responsible for providing a part of a potentially larger object.
The diagram in Fig. 2 reflects this concept. The root query resolves three orders. Then, the customer and product resolvers could be used to retrieve the corresponding data, but this is optional if all you need is a list of order IDs and order times, for example.
The Schema
Schema is a way of describing the object model and operations supported by your GraphQL service in a strongly-typed fashion. Queries received by the service are validated and execute against the schema. Additionally, there is an out-of-the-box support for introspection of the API of the service, which could be leveraged either by querying schema-specific fields with your own code, https://graphql.org/learn/introspection, or by using tools such as GraphiQL (note a subtle “i” in the name).
Below is the schema for our service.
scalar Timeschema {
query: Query
}type Query {
orders(first: Int!): [Order!]!,
}type Order {
id: Int!,
customer: Customer!,
time: Time!,
products: [OrderProduct!]!,
}type Customer {
id: Int!,
name: String!,
}type OrderProduct {
id: Int!,
name: String!,
price: Float!,
quantity: Int!,
totalPrice: Float!,
}
On the first line the Time
scalar type is declared. There is no built-in GraphQL type for time values, but it is possible to define custom ones. The graph-gophers’ library defines the Time
type, and here we are just making our schema aware of it. Also note, that the Int
type corresponds to Go’s int32
, and Float
to float64
.
Every GraphQL schema has a query
type (defined as Query
here) that serves as the entry point into any of the queries a service supports, and that is what makes it special. mutation
, which we are not going to use here, would be another example of a special type.
Query
’s properties define what information it is possible to request from our GraphQL service. For example, orders
says that you could get a list of first first
Order
s. The exclamation marks mean that the first
parameter is required, and that neither the returned slice nor any item in it are going to be nil
. Conversely, removing !
s makes parameters optional and return values nilable.
Similarly, the Order
type, aside from its two scalar properties, has a property called customer
that returns the order customer data, and a property called products
that can be used to request the list of products the order contains. The definitions of the Customer
and Product
types are straightforward.
You can see how the resolver graph maps onto the schema through the complex type properties.
The Resolvers
On to resolvers now. We will start from the root.
Query
This is the entry point to the orders
query we have defined in the schema, as well as all other future queries we might add. db.Client
is a client of the database storing order data.
package resolvertype Query struct {
c *db.Client
}func (r *Query) Orders(ctx context.Context, args struct {
First int32
}) ([]*OrderResolver, error) {
orders, err := r.c.GetOrders(ctx, args.First)
if err != nil {
return nil, err
}
res := make([]*OrderResolver, 0, len(orders))
for _, o := range orders {
res = append(res, &OrderResolver{c: r.c, order: o})
}
return res, nil
}
The first parameter is the current request’s context, the second encapsulates the query field’s arguments — just one in our case. A list of OrderResolver
s, one per order, is the return type.
We query the database for orders by calling the GetOrders
method of the DB client, then create an OrderResolver
for each order before returning them.
OrderResolver
Now let’s see what OrderResolver
looks like.
package resolver// OrderResolver resolves order properties
type OrderResolver struct {
c *db.Client
order *model.Order
}// ID resolves order ID
func (r *OrderResolver) ID() int32 {
return r.order.ID
}// Time resolves order time
func (r *OrderResolver) Time() graphql.Time {
return graphql.Time{Time: r.order.Time}
}// Customer returns customer resolver
func (r *OrderResolver) Customer(ctx context.Context) (*CustomerResolver, error) {
cust, err := r.c.GetCustomer(ctx, r.order.CustomerID)
if err != nil {
return nil, err
}
return &CustomerResolver{cust: cust}, nil
}// Products returns product resolvers
func (r *OrderResolver) Products(ctx context.Context) ([]*OrderProductResolver, error) {
// get products for the order and return a slice of `OrderProductResolver`s
...
}
Here, OrderResolver
just returns values of the simple-type properties like ID
and Time
. For the complex-type ones, however, the corresponding child resolvers are returned. Both CustomerResolver
and OrderProductResolver
are similar to OrderResolver
, so we’ll omit their listings.
Before returning a CustomerResolver
from the Customer
method, we need to get the customer’s data from the database. Same goes for order products. So, if there are three orders, we will need to make three database queries to get the corresponding customer records, and three more to get product lists for each of the orders. Clearly, this is not optimal as it puts a lot of additional load on the database.
DataLoader
To optimize the database querying approach, we will use DataLoader, a utility that provides batching and caching capabilities to application’s data-fetching layer. Returning to the previous paragraph’s example of three orders, the goal here is to get all three customer database records at once instead of doing it one by one.
First, the database client’s GetCustomer
method needs to be updated to return a batch of customer records, i.e., from
package dbfunc (c *Client) GetCustomer(ctx context.Context, customerID int32) (*model.Customer, error) {
...
}
to
package dbfunc (c *Client) GetCustomers(ctx context.Context, customerIDs []int32) ([]*model.Customer, error) {
...
}
Then, we will define a loader for customers, which will be used inside CustomerResolver
to get customer data instead of querying the database directly via db.Client
.
package loadertype customerLoader struct {
c *db.Client
}
The following method adds a batch-loading capability to the struct.
func (l *customerLoader) loadBatch(ctx context.Context, keys dataloader.Keys) []*dataloader.Result {
n := len(keys)
ids, err := ints(keys)
if err != nil {
return loadBatchError(err, n)
} cc, err := l.c.GetCustomers(ctx, ids)
if err != nil {
return loadBatchError(err, n)
} res := make([]*dataloader.Result, n)
for _, c := range cc {
// results must be in the same order as keys
i := mustIndex(ids, c.ID)
res[i] = &dataloader.Result{Data: c}
} return res
}
DataLoader will batch requests for individual customers from multiple CustomerResolver
s, and then call this method passing in a collection of customer IDs. The collection is represented by the dataloader.Keys
type, which is defined as a slice of dataloader.Key
s. dataloader.Key
implements the fmt.Stringer
interface and thus can return its string representation. Since customer ID in our case is represented as int32
, keys will be string representations of numeric customer IDs. All we need to do to obtain the requested IDs is to perform a conversion from dataloader.Keys
to []int32
. That’s what the call ints(key)
does.
With the IDs in hand, the actual database records can now be retrieved as a batch by calling the database client’s updated GetCustomers
method.
DataLoader expects the number of *dataloader.Result
s returned from the batch-loading method to match the number of dataloader.Key
s it has been passed via the keys
parameter. In case of an error, this requirement also stands. The loadBatchError
method simply returns a slice of the required length in which every element is a pointer to the same dataloader.Result
whoseError
property is set to the occurred error.
Another expectation set by DataLoader is that the result order matches the key order. That is, if the keys
parameter represents customer IDs [3, 2, 1] (in that order), then the corresponding return value []*dataloader.Result
must represent customer records in that same order, [customer 3, customer 2, customer 1]. The mustIndex
function finds the index of an ID in a list of IDs, and is used to ensure the correct order of the returned values.
Typically, DataLoader instances are created per request to avoid problems with using a single cache for multiple users with different access permissions. We will follow this best practice.
package loadertype contextKey stringconst (
customerLoaderKey contextKey = "customer"
orderProductLoaderKey contextKey = "orderProduct"
)// Init initializes and returns Map
func Init(c *db.Client) Map {
return Map{
customerLoaderKey: (&customerLoader{c}).loadBatch,
// orderProductLoaderKey: ... provided as an example
}
}// Map maps loader keys to batch-load funcs
type Map map[contextKey]dataloader.BatchFunc// Attach attaches dataloaders to the request's context
func (m Map) Attach(ctx context.Context) context.Context {
for k, batchFunc := range m {
ctx = context.WithValue(ctx, k, dataloader.NewBatchedLoader(batchFunc))
}
return ctx
}
The Map
type maps contextKey
s to batch-load functions. dataloader.BatchFunc
is defined as a function type with the same signature as customerLoader
’s loadBatch
method. We will call the Init
method during application bootstrapping. Then, the returned Map
's Attach
method will be invoked per request to add new DataLoaders, created by calls to dataloader.NewBatchedLoader
, to the request’s context.
Now, we can leverage the DataLoaders in the context by providing a LoadCustomer
function.
package loader// LoadCustomer loads customer via dataloader
func LoadCustomer(ctx context.Context, id int32) (*model.Customer, error) {
ldr, err := extract(ctx, customerLoaderKey)
if err != nil {
return nil, err
}
v, err := ldr.Load(ctx, key(id))()
if err != nil {
return nil, err
}
res, ok := v.(*model.Customer)
if !ok {
return nil, fmt.Errorf("wrong type: %T", v)
}
return res, nil
}func extract(ctx context.Context, k contextKey) (*dataloader.Loader, error) {
res, ok := ctx.Value(k).(*dataloader.Loader)
if !ok {
return nil, fmt.Errorf("cannot find a loader: %s", k)
}
return res, nil
}
extract
finds the requested DataLoader in the context, and the loader’s Load
method is called with a dataloader.Key
representing the customer’s ID (key(id)
). If LoadCustomer
, and subsequently Load
, is called multiple times for different customer IDs, i.e., from multiple resolvers, then DataLoader batches the IDs and queries the database for all of them at once. Then, Load
returns the requested customer’s model.
Finally, we can rewrite OrderResolver
’s Customer
method to make use of DataLoader.
package resolver// Customer returns customer resolver
func (r *OrderResolver) Customer(ctx context.Context) (*CustomerResolver, error) {
cust, err := loader.LoadCustomer(ctx, r.order.CustomerID)
if err != nil {
return nil, err
}
return &CustomerResolver{cust: cust}, nil
}
The Result
A GraphQL service always has a single API endpoint which, by convention, is available at /graphql
, e.g., http://localhost:8080/graphql
.
In order to list, say, the first two orders, we would POST
the following payload to the endpoint.
{
"query": "{
orders(first: 2) {
id,
time,
}
}"
}
Here is what the the response looks like:
{
"data": {
"orders": [
{
"id": 2,
"time": "2019-06-26T09:40:39.656311Z"
},
{
"id": 4,
"time": "2019-06-29T09:40:39.6591Z"
}
]
}
}
You can see how the shape of the response matches that of the request. We only asked for the ID and time of each order, thus only giving work to do to OrderResolver
and effectively only querying the Orders
database table.
Let’s extend the query by also requesting customer names.
{
"query": "{
orders(first: 2) {
id,
time,
customer {
name,
},
}
}"
}
The change is reflected in the response:
{
"data": {
"orders": [
{
"customer": {
"name": "andrew"
},
"id": 2,
"time": "2019-06-26T09:40:39.656311Z"
},
{
"customer": {
"name": "max"
},
"id": 4,
"time": "2019-06-29T09:40:39.6591Z"
}
]
}
}
Each order now has a new customer
property that contains a sub-property called name
with the name of the customer that placed the corresponding order. The customer data has been provided by two CustomerResolver
s which internally used DataLoader to get it from the data store. DataLoader batched both customer IDs and made only one database query.
Now, let’s not request customer details, but instead ask for product list for each order.
{
"query": "{
orders(first: 2) {
id,
time,
customer {
name,
},
products {
name,
quantity,
price,
},
}
}"
}
And here is the response:
{
"data": {
"orders": [
{
"id": 2,
"products": [
{
"name": "soap",
"price": 2.49,
"quantity": 4
}
],
"time": "2019-06-26T09:40:39.656311Z"
},
{
"id": 4,
"products": [
{
"name": "toothpaste",
"price": 3.49,
"quantity": 2
},
{
"name": "face wash",
"price": 8.99,
"quantity": 2
}
],
"time": "2019-06-29T09:40:39.6591Z"
}
]
}
}
Our service will not query the database for customer data in this case, but it will query it for product lists for both orders, as a batch.
Summary
We have had a look at a few GraphQL concepts, including strongly-typed schema, custom data types, resolvers, and flexible queries, and got basic understanding of how it works and what it can do. A more advanced topic that we have covered is database querying optimization using DataLoader.
Nevertheless, we have only scratched the surface here as there is so much more to GraphQL. There are plenty of resources online where you can learn more about it, including the official website, https://graphql.org.