EXPEDIA GROUP TECHNOLOGY — SOFTWARE

To Design a GraphQL Schema, Throw Out the REST

How to use graphs as a paradigm for discovering elegant APIs

Nikunj Manocha

Published in

Expedia Group Technology

8 min readOct 27, 2020

What appears to be a board with colorful tubes containing examples of wires with different gauges. Symbolizes connections. — Photo by Mali Maeder on Pexels

GraphQL offers a rich approach to creating data APIs for webapps to retrieve connected data. The designer must respect GraphQL’s unique style when designing a GraphQL schema to achieve idiomatic, elegant results. I’ll show you how to think in graphs, and a pitfall to avoid.

A primer on GraphQL schema design

Karthic Rao’s blog, Designing GraphQL Schemas, really impressed me with the simplicity and clarity with which the author explains critical aspects of designing a GraphQL schema. The use case is a user experience that allows the reader to select a blog, annotated by author, from a list. Blog and Author are two data entities. Rao visualises the schemas as shown below. A Blog is by an Author and an Author may publish multiple blogs.

Graph diagram showing the relationships between Author and Blog — A `Blog` has one Author and an Author can have many Blogs. Image by Karthic Rao.

The queries are, by design, limited to retrieving Blogs — because that is what the user experience requires. There is no use case in the UI where you want to get all authors, hence there is no getAuthors query. What happens if you really like a blog and want to look at other blogs from the same author? No additional queries are needed! The UI experience starts with a blog ID and you can traverse the data graph to get all blogs for that author.

We were able to support extending new UI functionality without writing and introducing a new GraphQL query. This is the promise and value of GraphQL. We only see this value if we design our data graph correctly. Here is another good blog that focusses on deriving a data model from a design in GraphQL.

If we design our data graph correctly, modelling the true data entities and their relationship resolvers will empower our front ends to innovate and quickly build great experiences without requiring effort-intensive backend changes.

My application’s data graph

I work for Vrbo™, a vacation rental company that’s part of Expedia Group™. The application my team works on supports viewing and managing the check-in/check-out of a customer’s booked trip. The experience includes showing the confirmation of a booking made by the traveler; the details of that trip, including a summary of dates, price and payment; and the post-booking amenities available at the booked property.

As you can imagine, the core logical data entity here is the booking. The booking information is seen in three different use cases supporting the user journey:

Show the booking information summary when a successful booking has just been made
Show booking information, on a sidebar on the page, when showing the amenities available at the vacation rental
Show booking information, on a sidebar on the page, when showing the financials for the trip

These are three different user journeys that require displaying booking data alongside other data specific to that phase of the user journey.

We were recently reviewing our GraphQL schema and data types and we made an interesting observation: While all three use cases operate on the same conceptual booking data, the same logical booking data (check-in, check-out date as examples) is represented in three different formats in our GraphQL schema:

BookingSummary,
TripSummary, and
RentalReservation

Let’s review these three entities in the diagram below. The check-in and check-out dates are reflected in three different types by the same or another field name. This is also true with the number of guests and the count of children traveling on the trip, and so on.

Series of images showing relationships between the entities previously discussed.

This is just one example, but it highlights that we have data replicated in different nodes throughout our data graph. There is likely a single data source and record that is being used to populate these duplicated data fields, but they were implemented differently at different times and with different names. How did this happen? Read on…

What is happening here?

The application uses one query and one data graph for each view of the application. So,

If the front end view is the booking confirmation page, then we have the bookingConfirmation GraphQL query with its own data types and graph.
If the screen is the conversation details page, then we have the conversation GraphQL query with its own data types and graph
If the screen is the post-booking amenities page, then we have the content GraphQL query with its own data types and graph

While a query for each screen is expected, the main issue is that each query supporting a view operates on its own data graph. Instead, each of the queries should have resulted in landing on the same data graph regardless of the view. The queries only help us land on different nodes in the same graph, but all three queries should have resulted in us going to the same Booking data type.

Represented visually, this is the state of our graph:

Graph diagram showing two complex trees that ultimately depend on a single Listing entity — The application has several logically disconnected graphs

The two queries merge on Listing, outside of the domain of my application— but prior to converging on this common dependency data node, we are duplicating all data nodes in our own application stack.

Had we designed more thoughtfully, we’d have a single data graph representing the application domain, with queries landing on different nodes of the graph to avoid duplicating data elements due to the disjointed nature of the graph:

Graph diagram showing greater unification between queries — The application should have one logically connected graph that all queries work off from

This enables a third screen in the front end app without creating new data graphs, empowering the user experience dev teams to quickly innovate on the product.

Takeaway: Within a bounded context, you should have one unified graph, instead of multiple disjointed graphs created for each use case. While we still have one instance of a graph, the application’s experience domain model data graph implicitly has multiple logical and disjointed graphs. This needs to change. But before we can change, let’s understand why we have it wrong in the first place.

How did we get here?

If we observe closely, the three different disjointed graphs above mirror the three different REST service endpoints. We have a set of legacy services and endpoints that back the three UI screens. When the dev team was introduced to GraphQL, it was new and there was a rush to get on to GraphQL technology. We threw in a quickly-crafted GraphQL schema mirroring the REST service endpoints. Naturally, the GraphQL data graph became a mirror of the backend REST services’ input and output. We did not take the time needed to clearly map out a data graph model that could be used in the GraphQL schema and one that would result in a single connected graph capable of serving any user experience need.

We still haven’t made a paradigm shift, though. We notice our dev engineers rolling out refactorings of our existing architecture with new services. And as newer REST microservices spring up, we surface these with newer GraphQL queries and yet another disjointed graph springs up. GraphQL demands a paradigm shift in dev approach. Our GraphQL data types need to be completely different and decoupled from our REST services and APIs. A GraphQL data type may get all its data fields from more than one backing REST service or data source — but how many of us have been able to make that shift in our design thinking? We are still naturally inclined to mirror one GraphQL Schema data type to the input/output DTO from a REST service.

Takeaway: Design your schema based on how data is used, not based on how it’s stored.

We are still thinking RESTfully about GraphQL

See an extract from the current as-is application's architecture below. Notice that it looks more like a vertical wiring architecture. This is also how the architecture would have looked even if we had façade REST services as the orchestration layer, right? We have just replaced calling different REST endpoints with calling different queries.

This is where we are still working hard, but not smart, with GraphQL. Working hard because, with every new traveler journey use case, we are adding more and more vertical wires to our architecture and passing through the GraphQL layer to get to the backend Service.

No, we are not being smart with GraphQL.

Architecture diagram showing the layers of a distributed system delegating calls downward one-to-one.

The benefits of a well-modelled data graph

The smart architecture would have a very few limited GraphQL queries that query our application’s experience data graph model via GraphQL connections and other relay data nodes that empower our front end experience to query any data it needs for new use case.

The promise of GraphQL, with the right data graph design that has matured and stabilised over time, was to solve the following problems:

Protection from breakages due to backward incompatibility: Let’s assume you got your data model design right, and with time it has really matured and stabilised itself. As you rearchitect and refactor your backend services, your data model is still robust and does not change, and your UI clients, especially the native app ones, are not affected.
Developer productivity and experience: As the UI is refactored for A/B tests to focus on identifying the best experience for the most effective conversion, the same data model can keep serving the different layout experiences across the UI platforms (web, iOS and Android).
You only really need to change the GraphQL data graph if your data model adds new data elements and this is new product data. When such a need arises, you append more data attributes or define newer types and relationships.

Within our development team, we are a bit of a journey away from realising the above benefits yet. We are still adjusting to the newer GraphQL paradigm and will now be investing in our single connected data graph. However, we have inherited a lot of existing legacy and we will take up fixing it so benefits can be realised earlier.

Learn more about technology at Expedia Group