Making mistakes with GraphQL
This article is expanding on a talk we gave at the Amsterdam Ruby meetup
Impraise has been running GraphQL in production for about 12 months now for some of our internal services, and we have been implementing a GraphQL API in our end product for the past 8 months, gradually taking over each feature of our continuous feedback end-user application. This system supports multiple clients (the iOS and Android applications, the web application, but also some other internal analytics services).
Why did we bet on GraphQL?
Originally, Impraise was a mobile only product, that would allow colleagues to give each-other feedback. Then we added a web application to provide this functionality as well as better analytics due to the newly available screen real-estate. And we grew. This gave birth to a few problems, like :
- Multiple clients to support with different requirements
- Endpoints becoming overloaded with options
- Duplicated endpoints
- Specific client/customer requests leading to “single use case” endpoints (pretty common in the B2B industry).
In an effort to make our product more consistent across platforms, and since we had been using React for a while, we decided to try GraphQL because of the flexibility it gave our client developers. I won’t introduce GraphQL to you, there are enough posts about this already. That being said, a lot has happened over the past 12 months, and as we learned more and more about real-life GraphQL, we obviously made mistakes. Here are the most important ones, how we are correcting them, and the lessons we drew out of them so you don’t have to face them so much.
Mistakes were made…
Returning classic HTTP error codes with empty responses
Coming from a REST background, it felt natural to use what HTTP provides us in terms of error management. Error codes like 404 not found or 403 unauthorised are amongst the error codes we were returning. It made sense at the time: most of our queries were simple and we were using GraphQL on small parts of our UI, fetching one object only, so the logical thing to do was to use HTTP request status codes. It is also what most client developers are used to when it comes to fetching a remote service. As the GraphQL spec explains, this is not how a GraphQL API behaves, as only parts of your query might return an error.
Why was this an error? Well, first of all, this is not what the GraphQL spec describes for error management. Secondly, when a complex query is made, only parts of the response might be faulty, and you still get some data back. This allows your UI to still be able to display parts of the data, and use the error messages returned by the API to adapt its behaviour. If you rely solely on HTTP error codes for error management, your whole page or section is now in an error state, even though there could be some data the UI can work with.
“With mobile clients, you don’t always control the adoption rate of an update.”
One of our main points of pain when correcting this rookie mistake is that we not only have a web application, but also two mobile apps. With a web application, you can make the switch on the API and the client application, deploy and call it a day. With mobile clients, you don’t always control the adoption rate of an update. So here is the route we took in order to solve this:
- Keep returning the HTTP error codes
- Return the normal GraphQL response with the “errors” entry (edit: here it is)
- When all clients have updated their error handling, you can monitor the mobile applications version adoption, and once you consider that enough of your users have made the switch, you can then stop returning the HTTP error codes on your server.
I guess the lesson here is to first read and understand the spec and apply it in its entirety.
Choosing a collection pattern early
A lot of apps rely on displaying lists of things, and then finding more data about those things. Facebook’s NewsFeed is a list of things, the New York Times displays lists of articles and subjects, etc. Displaying lists and collections of objects elegantly (in a technical manner) can often cause a few issues to backends, but how you handle these lists on the front end can be equally challenging. Welcome to the world of pagination (or not). Identifying your list pattern early in development is very important (at least for us it was).
But basically, you can:
- return an array of all the ids of your objects, and then one request for each object you want to display (very REST/hacky if you ask me)
- offset pagination (define the amount of objects per list chunk, and having the ability to request for the next chunk)
- cursor pagination (offset pagination but each object of the list has a cursor, allowing you to request a list chunk before or after a specific object of the list)
Being consistent with your connection pattern will save your client developers a lot of time
At Impraise, we started experimenting with several patterns, without much consistency between them. This is proving to be a lot of pain when extending the schema and trying to be consistent, especially since we also have mobile clients to take into account. We tackled the task of translating our GraphQL API to be Relay-compatible, and here is a list of things to keep in mind when doing this, because it cannot be done all at once:
idfield of objects is already in use by the clients (the DB id most of the time), which means we cannot just switch to Global IDs, this will break the clients. So we are introducing a new
oidfield, being the same DB id. Once all the clients have updated to use this new
oid, we can then implement the Global IDs as the
idfield of the object.
- When multiple fields are returning collections but with different pagination patterns, use a clever timing for deprecation, and communicate about it to all concerned teams (deprecation notices, email notices, release notes, make yourself heard!)
Being consistent with your connection pattern will save your client developer a lot of time, because they all behave the same way, allowing them to re-use their list components through their application. The other thing to keep in mind is what GraphQL utility you want to use client-side. If you are using Relay, or want to keep your front-end options open, I advise you make your API relay-compatible from the start. If you plan on using only Apollo, then you are pretty much free to use any connection pattern you want.
Poor early schema design
This is probably the most important thing to consider when you decide to implement a GraphQL API. If an API is only a way for client developers to tap into the domain of your application, it still needs to represent this domain in a sensible way. When we first started playing around with GraphQL, we still had a REST-like mindset, and put everything at the root of the graph, and then represented associations between the nodes/entities. Looking back, this was a mistake, and we are still trying to correct it gradually, because again, this is not something we can fix in one release. Everything requiring a breaking change in your schema is going to cost you client developer time, and therefore cost you money.
“GraphQL is unapologetically driven by the requirements of views and the front‐end engineers that write them.”
How do other GraphQL companies do it? Well, Facebook and GitHub, two of the most notorious GraphQL implementations, have two different approaches to this: Facebook will never break the schema (not a single breaking change for 4 years), only appending to it. GitHub does things a bit differently: there can be breaking changes to the schema while the node/subgraph is still in beta or internal. Once a functionality is shipped to end users, breaking the schema is sort of a last-resort action, but will happen. If this happens, then a deprecation warning is issued, and GitHub will communicate about it to third-party users before making the change (the following talk was from Robert Mosolgo, creator of the GraphQL-ruby gem and current GitHub employee).
Knowing all that, here is the strategy we have adopted in order to refine our schema, and some general guidelines we follow:
- Stop thinking in endpoints ASAP, and make sure everyone understands what graphs are, and what it means to query a graph.
- Avoid breaking it as much as possible once shipped to end users.
- Remove all unused nodes from the graph
- Add deprecation warnings to all nodes that do not make sense, and build up the schema from there
For new features & growing the schema, we came up with a process in order to reduce misalignment when defining the schema, involving client developers at an earlier stage:
- once the UX designers come up with mockups of a feature or a new screen, we sit down with the client developers, and try to map out what the ideal queries are for this screen/part of the screen. This gives us a good outline.
- Then we check if this is already existing in the schema, and draft tickets for what is needed to grow the schema in order to support this, while the final UI is in the hands of the graphic designers.
- Finally, by the time the final design is ready, the feature is already available for our client developers to work with.
One last piece of advice, and probably the most important one: if you are primarily a backend developer like myself and you are introducing GraphQL at your company, make sure you involve your client developers in the schema design process. I encourage you to read these GitHub threads:
- Whole API versioning · Issue #175 · facebook/graphql
- How to really solve version problem · Issue #134 · facebook/graphql
These are the few mistakes we’ve made (so far), but we are now in a much better place.
I hope this helps you decide if GraphQL can be a good fit for you. If you have any feedback, comments, suggestions, go for it, they are greatly appreciated!
*Update: Added a link to the GraphQL Authorization article