A GraphQL Lightning Talk
Delivered at Djangocon Europe 2016. It veers towards Python, but it’s a fairly general introduction to GraphQL.
Roy Fielding authored a prescient paper about how to structure hypermedia information systems in 2000.
You’ve probably seen this term before “REST”, that’s Fielding’s brainchild.
This also influenced the design of Django itself, having clean, resource-oriented URLs by default
The full idea is more nuanced but the short version is that you need to split up your content into cleanly-separated resources, which look a lot like your models (but not always).
In the beginning, you start out all idealistic, and you want to future proof, so you make sure you version your APIs.
And soon enough, your API’s surface area grows, and you have some more resources, more endpoints.
… and then you have a new feature which needs the same resource in a slightly different form, so you add a parameter, still with the best of intentions.
And then some more time goes by and you realize that you need to fundamentally rework the data that you return, and that’s a breaking change, so you redo the endpoints and bump the version number.
But unfortunately, you can’t throw away your old API endpoints, because there are old clients that still rely on them.
So on one hand you’re stuck supporting (and testing, and upgrading, etc) old APIs forever.
On the other hand, your frontend devs and product people are harassing you to add new stuff all the time.
So there’s this site you might have used once or twice…
They’ve released this massively popular framework for client side apps.
Everyone seems to know about that.
From the conversations I’ve had lately, far fewer people have heard about another Facebook project called GraphQL. It’s a fundamentally new way to do APIs.
So what is GraphQL?
It’s a big idea with a few different dimensions, and like all big ideas, I had a hard time digesting it at first, because the explanations don’t make sense until all the different dimensions are covered.
We have to start somewhere, though — so let’s start with “a strongly-typed declarative query language.”
The strongly-typed part means that this language is safe, in that everyone speaking it (clients and servers) know what types of things are allowed in, and what kinds of data to expect in response. If a given GraphQL query — something analogous to an endpoint in a RESTful API—has an integer parameter, the GraphQL server knows that it must be an integer, and will fail loudly (and usefully) if you send a string in its stead.
On the other hand, the GraphQL type system also guarantees return types — you can safely expect the server to return a string field where the type specifies a string. If the inner implementation of that type fails to return a string, then it’s a bug that will not pass silently.
The declarative query language part means that you ask the server for data in the shape that you want it. Let’s see that in action.
This is a GraphQL query called foo. It kind of looks like JSON but with the values removed. This query specifies that we’re interested in three fields (and only three fields) of whatever type foo will return — the ID, first name, and last name fields. There might be twenty fields total, but our query expresses what we’re actually interested in for the purposes of this query.
The response gives you proper JSON in the shape that you asked for.
This is huge for a number of reasons.
You’re freeing up your frontend developers to Get Shit Done™. They don’t need you to write new API endpoints every time they need data in a slightly different form for a slightly different view. They can compose the data they need in their queries.
This is also a Big Deal™ for mobile clients. A clean, orthogonal RESTful API is like a fully normalized database — more often than not, you end up needing to stitch data together from multiple queries in order to build up what you need for a given view. Each of those queries involves a request, which can be catastrophically bad for performance on high-latency networks like cellular data.
The bad news doesn’t end there, because even if you manage to get all that data across the wire — some of which you might not even need—you then need to do the stitching in an environment that is hampered by slower CPUs and limited memory. Sometimes, stitching data together is simply not doable in the memory constraints of a mobile device. Even if it does work, you’d be unkind to that device’s battery.
To add insult to injury, all of this also means writing more code to reassemble the data clientside. More code is more opportunity to create bugs.
What ends up happening in many cases is a kind of denormalization—adding API endpoints which return the data “pre-stitched” in the shape that the view needs. Keep going down that road and you end up with the aforementioned “thousand endpoints” problem.
So those are the queries — but what about the other HTTP verbs you know and love?
There’s only one endpoint: /graphql. You GET it for queries, and you POST it for everything else—creates, updates, and deletions.
Mutations look like this, and you can define parameters of specific types, including whether or not those parameters are required.
In fact, you can pipeline queries and mutations, saving expensive network roundtrips. If one of them fails, it won’t take the rest down with it. You’ll get back a partial success and an errors block for the parts that didn’t pass muster.
Here’s an example GraphQL query as written with Graphene. We have a query named “Query”, which returns an object containing one field — a string.
The resolve method is where all the magic happens. You can do whatever you want in there. Talk to your database, talk to another API over a network, do both at once—whatever you want. The only hard requirement is that you retrun something of the appropriate type, which is a string (in this case.)
There are tools for defining types and queries based on Django’s models as well.
The types themselves are roughly analogous to Django’s models. They contain any number of fields, which have a type. Here we’ve got a “Person” type with five fields. The first three are primitive types, like int and string. The latter two are other user-defined types. Types can be self-referential, as in the friends field being specified as a list of Person.
When instantiating a type, the parent’s resolve method is called and is expected to return a chunk of data that the type can use to populate itself. If we have something like PersonQuery, then the parent is the query, and it is the query’s resolve method which fetches the data from wherever it is (database, etc). Remember — every graphql thing is eventually rooted in either a query or a mutation, both of which you must explicitly specify as part of your schema, much like you’d specify endpoints in a classical REST API.
However, the friends and posts fields can be thought of like foreign keys. They wouldn’t be part of the data that you’d get back from an average user lookup in a database. The friends field can have its own resolve method which goes and does some other action — say, looking up a users’ friends by that users’ id.
The craziest thing about this setup is that you might have separate underlying datastores for people and for posts. Maybe one is in a database you can query directly, and the other is behind an API you must call. It doesn’t matter—every field can have its own resolve method. You can use GraphQL to wrap (and hide) as many data sources as you like, reducing a ton of clientside complexity.
A really useful emergent property of this is that you can do ghetto JOINs across datastores, so long as you have a common key (like the Person ID in the example above). This won’t be magically faster or better than performing two separate queries and stitching the data together yourself — because that’s exactly what would happen here—but it will take place on the GraphQL server, which probably has less latency and more memory than mobile clients.
In that sense, you can think of GraphQL as a means for standardizing data preparation for views. You’re still doing the same work, but in a standard way, and in a better place than on client devices.
And at the end of the day, using GraphQL isn’t actually radically different than what you’re doing now. You’re still querying an API and getting results back. You can think of GraphQL as an alternative to things like (the delightful) Django Rest Framework.
I’m currently using it in a production at Heroku, as the backend for a customer-facing ReactJS single-page app. It is currently proxying one database and three internal-only APIs, and for this purpose, it’s been a joy to use.
Small caveat: I’m using the original JS implementation, not Graphene, so I can’t speak to pain points encountered in Python-land.
I don’t know whether I’d use it for a classical public API, for a number of reasons. Caching GraphQL is trickier because of the tailored-to-fit nature of requests. I could always cache the entire response to deal with clients hammering the API server at high rates, but more granular caching would require thought and care. Another reason why I might not choose GraphQL for a public API is simply one of popularity—developers are comfortable with the REST model, but I don’t think there’s the same level of widespread familiarity with the GraphQL approach.
If this piques your interest, go forth and try it out!