Future-Proofing Backend Services with GraphQL, PostgreSQL and Docker (Part 1)
At Yipee.io, we recently added team functionality to our modeling tool. Teams provide support for sharing applications and are tied into our pricing, enabling teams to purchase different plans with different sets of associated features and increasing complexity for us. We agreed on the relationships between users, teams, plans and features but it wasn’t necessarily clear up front exactly how the UI would need to navigate the relationships. We might want to find the features defined for a user but, alternatively, we might want to find all users, teams1, and plans that possess a particular feature. At the time, for historical reasons, we had a backend system driven by REST APIs that stored data in MongoDB. MongoDB is an excellent choice for certain use cases (particularly very high data volumes) but a complicated graph of relations, which can be navigated from multiple endpoints along a wide variety of paths, is not a particularly good fit. We decided to switch our backend to a relational database as we didn’t anticipate ever reaching a volume of data that would make a relational implementation problematic. Now that multiple vendors support RDBMS as a service, the decision to use a relational database has been made even easier. Having had positive experiences with PostgreSQL in the past, we decided to give it a try. Our DaaS vendor supported PostgreSQL in addition to MongoDB, so the switch could be made fairly easily. As it turned out, PostgreSQL was quite possibly the ideal choice for this application. I’ll discuss its unique qualifications later in the post. Once we had decided on a relational database, a similar thought process ensued around our REST APIs. The document model behind REST is a somewhat poor fit to a relationship graph with properties on the associations. The model we were managing looked like:
We could have mapped out REST endpoints for our immediate operational needs, but we might easily have been chasing use cases for some time as the UI or another API user needed different collections of data. An initial dataset for a user identity needed by the UI was:
- The user’s id, terms and features
- Each org1 containing the user with its’ id, name, owner (with id and identity), set of users (with id and identity), set of admins (with id and identity), set of writers (with id and identity), plan elements (with id and name), set of features (with id and name) and the set of features for the plan (with id and name)
The backend API needed a similar but somewhat different dataset to establish a user context. These sorts of queries make for an odd REST interface. So we decided to move away from REST for our team model and use GraphQL. The rest of this discussion will talk about our experience using GraphQL with PostgreSQL to implement our team model. 1*Teams are called orgs in the schema since there are orgs for individuals and we wanted to talk about true team plans separately. Single-person orgs work just like orgs associated with larger teams.*
There are many excellent sources for information about GraphQL online. In particular, this site has a wealth of useful material. That said, I’ll give a brief overview here for context. GraphQL is an alternative to REST that allows clients to design their own queries instead of requiring that all requests be defined by the server. This works because GraphQL is schema-based. A GraphQL server defines:
- a schema which specifies all the meaningful relationships between the elements of a data model
- a set of queries which are entry points into the schema for retrieving data
- a set of mutations which are the operations that may be performed on model data
Unlike a REST api, a query entry point does not itself define what data may be retrieved. For example, here is a query from our team model:
This says that a client can retrieve a user given a service (e.g. “GitHub”) and an identity (e.g. GitHub id). However, the user is just the beginning. Here are the descriptions of the user and org types in our schema:
An exclamation point (!) means that the item may not be null. The user definition says that a user consists of three string values, one boolean, and three lists containing instances of associated types (similar for an organization). A GraphQL schema contains type descriptions like this plus definitions for the queries and updates (called “Mutations”) supported by the server. Our queries and mutations are:
To me, this looks a lot like some sort of RPC definition. The special sauce that makes GraphQL powerful, however, is the client’s ability to select which data she wants returned. Let’s go back to that first query:
Here is a use of it that requests the value of the “terms” field:
This is passed to a GraphQL server via an HTTP POST. All operations for the API go through the same HTTP end point. In
curl, this would look like:
The returned payload is:
Notice that we received the value for the “terms” field. What if we ask for more?
Now we get:
So we can specify the fields we want for our types and even the fields for referenced objects recursively. Here we said we wanted the “name” and “maxWriters” field for each org owned by plissken. As one final example, let’s ask for the admins for each owned org:
Clearly, this is very powerful. Once you establish a schema, a GraphQL client can select the information required for a particular task as needed. Also, if we decide later that we need more data for a particular type, we can add it and all existing calls will remain unaffected.
Mutations are very much like RPC calls except that they allow specification of the data to be returned, just like queries. If we set out to update our user, we can use our
This says that we can update a user by specifying her service and identity values and an instance of
UserUpdateInput for the new data. The update returns a
UserChangeResponse which has fields for the user and any errors that were generated. Just like in a query, the returned user data can be expanded by specifying fields of interest. Here's an example:
This call returns:
We picked up substantial benefits right away for this kind of highly connected model just by switching to a relational database. Supporting navigation from multiple directions and sets of users for organizations with different properties (e.g. admins, writers) would require redundancies in a document model and would make it impossible to perform updates transactionally. Additionally, the integrity constraints provided by an RDBMS remove an entire class of errors we would otherwise need to handle. One simple example is if a client tries to add a user to an organization but presents an invalid user id. This is caught for us by a constraint that the user field in the table connecting orgs and users reference an entry in the users table. Done! None of this is probably news to anyone reading this, but I feel like some of the benefits of an RDBMS are often forgotten today with the popularity of NoSQL databases. In any case, we were able to easily map our GraphQL schema onto a relational model. We’ll see where PostgreSQL really shines when we talk about optimizations in Part 2.
I believe it’s valuable to discuss Docker in the context of future proofing services because we gained great leverage from our user of Docker. Changing out underlying database technology could have easily caused a significant impact to development for the following reasons:
- Our UI developers were working concurrently on unrelated features.
- We didn’t want them to wait until our transition was complete to get started.
- Deploying a database and tying it together with related services can be complicated and time-consuming.
- Swapping out environments to work on different tasks often produces a lot of development friction.
We got around these issues through judicious use of Docker. Here at Yipee.io, we model our own application using our modeler and run it using Docker. This means that our UI developers could simply use docker-compose to bring up or shut down our entire set of services (including the database). They could bring up one set of services to do unrelated development and switch to the new database version in about 30 seconds. Also, as we added features or fixed bugs, we could push a new version of our team functionality as a Docker image and they could incorporate it with little to no effort. They, in turn, could generate a new UI image to hand off to our UX team for review. Such short cycle times radically change the time and resources required to react to user feedback and new requirements.
In part 2, I’ll discuss how GraphQL servers are built and how PostgreSQL offers significant leverage for future proofing their construction.
Originally published at gist.github.com.