10 min readDec 23, 2017

We’ve been using GraphQL in production for a few months now, and have been very delighted with it. Our products, their relations with each other and their architecture behind the scenes aren’t so simple, and our GraphQL server had to account for various constraints and requirements. I feel like our usecase isn’t a typical one, and thought I would share our GraphQL architecture for the world to see — and improve.

I won’t go into the basics of GraphQL here, so if you’re unfamiliar with it, I invite you to check out this great introduction to the technology.

Some Product Background

I work in an enterprise providing personally-tailored analytic services. To keep it short, we have a bunch of analysts spread into teams, and each team given a specific client. In addition, we have quite a lot of data curators, who manually sift through our client’s raw data and prepare it for our analysts.

Our business becomes convoluted when you introduce the fact that our clients insist their data remain confidential — that means each analyst is allowed to work on a very specific subset of data and documents. In addition, some clients require their specific analysts to cooperate on specific projects, which means that client A’s analyst may be allowed to work on client B’s data with client B’s analyst, but not the other way around. This system also extends to our data curators.

When you factor in hundreds of clients with thousands of analysts and data curators, things get messy. We have an inside classification system for this exact issue, which defines which user is allowed to see documents and reports from which source. It’s called M-Class, and we’ll get to it a bit later.

Where does GraphQL fit in?

A few years back a new app was created for our analysts called Bone. Our company was just shifting from desktop to web, and without any experience, a small team created a monolithic PHP/MySql application that quickly transformed into some of the ugliest code you can imagine. However, for our users, the site offered a multitude of new features, and a lot of analysts loved it.

As the years passed, Bone was expanded and the monolith grew, until 3 years ago when we started to break it down into microservices. We already had a few in place (Bone needed to integrate with a few legacy systems written in .NET, so we had adapter microservices between the PHP monolith and those .NET Web Services), and Node was chosen for most of the new services.

Meanwhile, Bone wasn’t good enough for the data curators, who kept using their old desktop applications. 2 years ago we created an alternative to our curators’ desktop apps called Mountain. It kind of looks like a specialized Jira/Trello/Wekan, but integrated closely with our analyst’s site, Bone. Behind the scenes, Mountain was communicating with Bone through a variety of API calls, sometimes using the PHP server as an API gateway to various microservices.

Enter Lampoon, a new application which we started developing at the end of last year. Lampoon was designed to give both our curators and analysts a common app to view various reports. You could do the same in Bone, but the UX was terrible and instead of rebuilding it on top of Bone’s mess of a codebase, we decided to once again create a “satellite application”, using API calls to integrate our new app with Bone’s data.

However, Lampoon had very different requirements from Bone’s infrastructure. Lampoon offered a smart way to view lots of reports, so it had to do a lot of “selects” against Bone and because it’s mostly a reading application, the amount of “inserts” is practically none. If at around 7k Bone users and 2k Mountain users we saw around 3.5k selects/s on our DB, with only 50 Lampoon users that number jumped to 4.5k — and we were planning on getting at least half of Bone’s users to start working with Lampoon. After a successful pilot with 5 users, we ran some stress tests and realized that expanding the system wouldn’t work as things stood — API calls were taking too long to return, and adding too many users made Bone crawl to a halt.

While Bone’s database had no problem handling these queries (it’s not that big, <100Gb), it was our old API calls that failed us. Bone is built around an ancient ORM wrapping the DB, and over the years it became incredibly inefficient at handling requests with multiple items. Not only that, but as development on Bone continued, the basic Item was expanded, so not only was the ORM not built to deal with multiple items, the items it was built to handle became a lot more complex.

In order for the data to be consistent across all 3 systems, neither Mountain nor Lampoon save any data regarding Bone’s items. This means that interacting with an item through any system carries the changes to the others, but burdens Bone with being a single point of failure and an obvious bottleneck.

At this point we realized that Bone was becoming more of an infrastructure than a standalone application, and decided to divert resources into improving connectivity with said infrastructure. Bone had to perform better in order to enable the satellite applications and as we saw it, we had 2 options:

Take a deep breath, dive into some horrific PHP code and fix Bone’s ancient ORM or,
Create a new ORM layer, querying Bone’s DB directly.

We chose #2, and we decided to try our hand at using GraphQL for the project, with the main argument being that we foresaw lots of development both on Mountain and Lampoon, and we wanted Bone’s infrastructure to be as flexible as possible.

Our Schemas (and Loaders)

Before I start, I’ll just mention that we’re using graphql-js and express-graphql as far as packages go.

Like every GraphQL server, we started by describing our schemas. Bone and Mountain are both based around a basic Item entity, which has multiple Versions, attached Files, Metadata and Comments. In addition, each item version, file or comment has an owner, which is its creating User.

So, our schemas will be Item, Version, File, Metadata, Comment and User. I’ve also included each entity’s various fields:

*Our schema model. Capitalized fields are of the connected type.*

As for actually fetching the data, we use Facebook’s DataLoader to handle batching (it’s great, check it out). All of the entities are stored in our main MySql DB, so the loaders simply build suitable SQL queries, JOINing all of the relevant tables, before batching the queried IDs into a nice IN clause.

So we started playing around with GraphQL, and for a start we exposed a simple Schema that accepts Item IDs and fetches them, along with all of their entities, straight from the DB. Running it through our load simulations showed a lot of promise — Lampoon requests that would take up to 15 seconds to complete against the API were fetched in around 300ms! It looked like we were heading on the right track, but we still had some work to do before before deploying to production.

Filters

As things stood right now, our server was pretty standard. Here’s what it looked like, with each component being its own GraphQLObjectType:

But we weren’t finished just yet. Remember our M-Class system, the one that keeps each user from seeing unauthorized documents and reports? Bone had it integrated into its ORM, so an API call for a specific user would only fetch the Items that said user was allowed to view. Files also had to be included into M-Class, as some users would be authorized to view a report but not its attached files. When writing our GraphQL server, we had to somehow filter out those unwanted entities.

In order to determine whether a user is allowed to view an entity, we have to fetch that user’s Approved M-Class values, called “areas”, and compare them to the entity’s area. If the entity’s area is included in the user’s approved areas, the user is authorized to view the entity.

Ideally, when resolving the GraphQL query against the DB, you’d simply add a WHERE clause comparing each item’s area to the user’s authorized areas. Working with DataLoader and batches can make this not as trivial, but you’d simply need to provide a context to the loader. This can be done by either a singleton utility class that can be accessed by the loader, or you can fetch the user’s areas when creating the DataLoader (you should create a new DataLoader for every request).

Our problem? The entity’s area is stored in a different DB and microservice, and the user’s approved areas are in yet another service (there are other apps in our company that use M-Class). Because we were already planning on fetching the user areas separately before processing and injecting it into a singleton context before executing the GraphQL query, having the data somewhere else wasn’t a problem. However, seeing as the entity’s areas weren’t stored in our own DB, we couldn’t add that nice WHERE clause to our SQL query.

To get around this issue we decided on separating the filter logic (WHERE clause) from the fetching logic (SELECT clause) — given a list of IDs, we’d first run them through a filter (called M-Class Filter, obviously), which would kick out any unauthorized entities. Given our new, completely authorized list, we’d send it to our previous GraphQL resolver and violla! Our client has a fresh list of approved, data filled Items to populate Lampoon with.

Our GraphQL Query resolution flow, now with the M-Class Filter.

Finders

As things stood, we were ready to go — and we did. Bone’s GraphQL server was deployed and with it Lampoon, and the results were incredible. The system was blazing fast, our users happy with the new UX, and our infrastructure stood firm. It was now time to look ahead into the future.

At its core, Lampoon creates a personalized Item feed, split into categories our users define. Lampoon works by listening to Bone’s Pub/Sub, subscribing to a specific event (which indicates an Item or report is ready, and should be viewed by other users). When that event is published, Lampoon runs the relevant item through its own labeling engine, calculates which users should see it in their own feeds, and saves that indication in Lampoon’s own DB.

When we started work on Lampoon we already anticipated that our users would ask to view specific feeds, built according to various conditions, in addition to their personalized one. For example, some users would like to view all items with specific Metadata. GraphQL easily supports this with different queries, built around the same GraphQL components.

Our Item Schema Resolver can fetch Items given a collections of IDs, and our basic query was a simple pipe — it recieved those same IDs as an argument, and passed them (through the M-Class Filter) to the resolver. In order to build our new Metadata query using the same components as before, we simply needed to fetch a list of IDs matching the metadata condition. To accomplish this we created another bulding block in our architecture — Finders. Their role was to construct a list of IDs given a specific condition from a query, and pass it along to the Item resolver. This is how our ByMetadata query flow looked like:

Another usecase is that our users wanted all Items created within a certain date range. Just like with the Metadata Finder, we can easily create a Date Finder and plug it as a new GraphQL Query straight into the M-Class Filter and Item Resolver:

Creating new queries, and with them new feeds, is as easy as defining a new Finder, and because the finding mechanism is completely separate from both the filtering and fetching components, we can even find items according to data which is completely external to Lampoon or Bone (for example — all items which had gone through a specific process in Mountain).

M-Class and Files — Combining Filters and Finders

As mentioned earlier, M-Class involves both items and their attached files — a user may be authorized to view an item, but not its attached report. When we started writing our GraphQL service this posed a problem as all M-Class logic was already finished, but after we created our Filter architecture, we were able to reuse the same component for different entities, in different parts of our codebase.

In order to plug the M-Class filter before fetching the files, we first need to get all files attached to an item — or, in other words, create a “File by Item Finder”, fetch the relevant file IDs, pass them through the filter and only then resolve them. Both components slid in quite nicely, and we were quite satisfied with the results. Finally, a complete query follows the following diagram:

Final Words

While we are still figuring out how to get the most out of GraphQL, our experience so far has been extremely positive. From the ease of development to the performance it offered us, we are keen to keep improving and adding functionality to it. We still have some dilemmas ahead (can we “concatenate” Finders? How generic do I want them to be? Will seperating the finding, filtering and fetching requests come back to bite us in the future?), but so far we feel comfortable moving forward with our current architecture.

If you have any comments, insights, questions or suggestions I would love to hear from you.