Taming REST APIs using GraphQL

Note that we also cross-post this post on our blog about building the i.stuff.co.nz news site: https://technology.fairfaxmedia.co.nz/taming-rest-apis-using-graphql/

At Stuff, like many other development teams, we’re currently moving our development efforts towards creating microservices.

There’s a number of reasons why we need to do this, but we’ll assume here that you’re familiar with microservices and the reasons why you want to do this. If you’re not however, check the following links:

One of the considerations in moving to microservices is the need to provide data to your microservices via APIs, in order that they can be properly decoupled from the monolith.

If you (as we are) happen to be using a number of REST APIs, some of the questions that arise as part of this move will likely include the following:

  • How do we surface our REST APIs such that our microservices can make the most efficient use of the network when retrieving data?
  • How do we surface our REST APIs such that our microservices can get only the data that they really need?
  • How do we reduce the complexity of our microservices in terms of having to request and unpack data from different endpoints?

To unwrap these questions a little bit, let’s consider some of the issues with REST API’s — though REST has been a very useful pattern, REST APIs have a tendency towards inefficiency in a number of respects, including under-serving and over-serving.

Under-serving

Under-serving occurs when the data you need can only be obtained by calling multiple REST endpoints. As an example, consider the following:

curl -X GET http://example.com/api/user/{userid}

This endpoint may return something like the following:

{
"id": 1,
"username": "jcdarwin",
"first_name": "Jason",
"last_name": "Darwin"
}

In order to find Jason’s pets, we may have to do a separate REST call:

curl -X GET  http://example.com/api/user/{userid}/pets
{
"pets": [
{
"id": 1,
"name": "Minty",
"species": "cat"
},
{
"id": 2,
"name": "Ari",
"species": "cat"
}
]
}

Then, in order to find particular details about each of Jason’s pets, we may have to do further REST calls:

curl -X GET http://example.com/api/user/{userid}/pets/1
{
{
"id": 1,
"name": "Minty",
"species": "cat",
"favourite_food": "sardines",
"last_vet_visit": "2016_09_11"
}
}
curl -X GET http://example.com/api/user/{userid}/pets/2
{
{
"id": 2,
"name": "Ari",
"species": "cat",
"favourite_food": "mice",
"last_vet_visit": "2016_11_25"
}
}

Apart from the observation that this is a particularly badly implemented REST API, you’ll observe that in order to determine the full details for Jason’s pets we had to do four REST calls — this is *under-serving*, whereby a given REST call returns insufficient data for the client’s purposes, and therefore we need to do multiple calls.

This is of course time-consuming, and tedious for the client developer, as she must write the logic to handle the various REST calls, chaining them together in series.

Over-serving

Over-serving occurs when the API call returns more data than you need, such that there is wastage in terms of network (and possibly CPU resource if the REST data is particularly large / complicated). Returning to our previous example, if we only wanted to determine the date our pet last visited the vet, we’d also get redundant information, such as the pet’s favourite food:

curl -X GET http://example.com/api/user/{userid}/pets/1
{
{
"id": 1,
"name": "Ari",
"species": "cat",
"favourite_food": "mice",
"last_vet_visit": "2016_11_25"
}
}

In this example the redundant data is not too consequential, but there are plenty of examples of real-world APIs where the actual data desired by your application is only a very small fraction of that returned by the REST API request.

Our ideal situation

Ideally, in order to make the most efficient use of resources (network and CPU) and minimise the amount of API-specific logic that we need to write, we’d want something like the following:

  • A single API endpoint to talk to
  • The ability to make a single API request to get all the data we want
  • A single API caching / auth mechanism as far as our frontend client is concerned
  • An API which doesn’t under-serve or over-serve to the frontend
  • An API which is self-documented, and easy to reason about
  • An API which is quick and easy to use in the frontend

GraphQL

In order to address these concerns, we’ve recently been exploring GraphQL, an initiative by Facebook that provides a means of building APIs that address the concerns described above, and that can be used to wrap existing REST APIs.

GraphQL is a specification, which straight away gives it an advantage over REST, which is simply a rather loose architectural pattern. Although we have REST implementations that use technologies such as Swagger to describe the REST API declaratively, often we cannot combine or consume REST APIs without manual intervention.

As the official GraphQL site states, GraphQL can be described as follows:

“GraphQL is a query language for your API, and a server-side runtime for executing queries by using a type system you define for your data. GraphQL isn’t tied to any specific database or storage engine and is instead backed by your existing code and data.
A GraphQL service is created by defining types and fields on those types, then providing functions for each field on each type.”

and elsewhere:

“GraphQL is a data query language and runtime designed and used at Facebook to request and deliver data to mobile and web apps since 2012.
When we built Facebook’s mobile applications, we needed a data-fetching API powerful enough to describe all of Facebook, yet simple enough to be easy to learn and use by our product developers. We developed GraphQL in 2012 to fill this need. Today it is the primary way we build client apps and the servers that drive them.”

Referring back to our examples above, we could look instead to perform a single API query to retrieve the `last_vet_visit` data for pets for a given user.

Our GraphQL query could look like the following:

users(id: 1) {
pets() {
id
last_vet_visit
}
}

Whereas the data returned (as regular JSON), could look like the following:

"data": {
"users": [
{
"pets": [
{
"id" : 1,
"last_vet_visit": "2016_09_11"
},
{
"id" : 2,
"last_vet_visit": "2016_11_25"
}
]
}
]
}

Note that the above query, although submitted as a single query and where results are retrieved as a single JSON object, sees us still calling the requisite number of times in order to collect the data. In this case, this means at least the following calls:

curl -X GET http://example.com/api/user/{userid}/pets
curl -X GET http://example.com/api/user/{userid}/pets/1
curl -X GET http://example.com/api/user/{userid}/pets/2

However, it’s probably a much better thing that these various REST API calls are happening serverside, where network throughput and CPU is typically much better than on the client.

So, our use of GraphQL could look somewhat like the following:

Using GraphQL to wrap REST APIs

There’s a lot more than can be and is said about GraphQL, and how it can be useful in wrapping REST APIs — in order to keep this post relatively brief, I’ll simply offer some links here, and presume that you’ve read them before we continue below, as others are better than me at describing the finer points of GraphQL itself.

Frontend versus backend API calls

When considering how to deploy a GraphQL-based API server, we need to determine how best to expose it.

For example, if we use React to perform client-side component rendering, obtaining the data for the components from our GraphQL API server via HTTP calls (i.e. in the browser), we then need to think carefully about issues such as security and authorisation, as well as some of the issues that arise because of what our GraphQL server makes possible:

  • given that GraphQL queries can be recursively nested (e.g. if we want to retrieve our friends, and the friends of each of our friends using the same underlying REST API), a single GraphQL query can represent quite a bit of activity for the underlying REST APIs, aka DDOS by query
  • given that a single GraphQL query can wrap multiple REST APIs, we may need to think about the granularity of authentication / authorisation
  • given that a single GraphQL query can wrap multiple REST APIs, it makes sense to think about where to place the caching layer

For our purposes, we’re able to avoid some of these issues for the time being if we look at having our React components only rendered serverside. We currently don’t have many use cases for components that need to refresh and update their data in response to UI changes; this may change as we move forward, but for the time being keeping our component rendering serverside means that we don’t need to expose our GraphQL server to the world (i.e. HTTP requests from browser clients), and therefore we have much more control over how the API it exposes is used.

In the diagram above, the lefthand side shows backend rendering, whereby we create our frontend components serverside, and then serve them to the page as fully-formed HTML via Edge Side Includes.

In this scenario, the browser has no knowledge of, nor access to either our React component server or our GraphQL API server.

However, if we consider frontend rendering (the righthand part of the above diagram), our browser (and any other nefarious agent on the web) has some knowledge of, and access to, both our React component server and our API server.

So, for us, serverside rendering (at least at this point) makes a lot of sense for us, though we need to be aware that we do forego some possible benefits that frontend rendering would allow:

  • Clients being able to interact with the APIs — particularly important for form submissions
  • Served content can be more up to date than with our ESI includes, which are typically only generated every few minutes

Wrapup

We’ll discuss other considerations around GraphQL in upcoming posts, but hopefully this post has provided some idea of the benefits of using GraphQL as a backend server API to wrap various REST APIs.

Further reading:

Show your support

Clapping shows how much you appreciated Jason Darwin’s story.