Taming Subscriptions with GraphQL

Matt Krick
Sep 9, 2016 · 8 min read

Subscriptions are hard. In the JavaScript community, we don’t like to focus on hard problems. We like to tackle easy problems, make them even easier, and watch our GitHub stars soar through the freaking roof. You’ll notice this for everything from todo lists to benchmarks that have no real-world basis (because rendering 10000 identical DOM nodes is soooo common). The reason is simple: easy problems require less time to solve & appeal to a wider audience. Hard problems are usually reserved for closed-source repositories where they’re dealt with poorly because hey, who’s gonna see?

The app I’m working on is completely open-source, so when I started building realtime functionality between users, I wanted to think extra hard about how it should look. After all, the future of web applications is multiplayer. When I open Lyft, I see cute little cars driving around a map; when I open Slack, I get a realtime conversation with push notifications; and when I proofread a Medium article at the same time as a friend… Medium takes a crap and makes me want to swear off this stupid website for good (seriously, it’s 2016 and you’re gonna pop up a modal that I have to close via mouse click just because someone else added a comma in a 3000 word story?). That last example just drives the point home: multiplayer is no longer a feature, it’s expected.

The Current State

So what makes subscriptions so darn difficult?

  • Multiple States: A websocket couples the client to a server, meaning your DB isn’t the only source of state anymore
  • Push notifications: All mutations must flow through a message queue so other clients who care about that mutation receive the update
  • Connectivity: A temporary disconnect could result in a missed document, causing sync issues among clients
  • Scalability: For performance, unions should generally avoided, which means streams are restricted to a single DB table
  • Relationships: Subscriptions provide a stream of nodes, but no relationships, making them promiscuous sluts that we should shame with a bell (that one’s for Crockford, you bastards #boycottNodevember).

And for all that headache, what do we get? A smaller payload, fewer redundant hits to the DB/cache, and an update that is marginally faster. Hardly seems worth it. Polling is easier to understand, easier to set up, and works for just about every use-case save chat rooms and games. So why even bother? Because web apps are games.

We’ve all taken the courses to stay competitive: Game Theory 101. Badges. Status bars. Friends. Points. Gamification works. The principles are sound, they’re in use today, but the infrastructure hasn’t caught up. A 5-second polling interval is like dry humping the server: it only feels like realtime until you’ve used a socket. For those dang millennials that grew up on broadband, it’s just an extra 4.7 seconds to lose interest. And now they’re in the business world, wondering why your SaaS sucks.

Prior Art

No one knows how important gamification is more than Facebook. Chances are, you hate the product, yet can’t deny the high that a stupid red notification bubble gives you. Another hit is as easy as a witty comment or new photo, which earns more red bubbles for you and another penny in ad revenue for them.

The engineers behind Facebook are some of the smartest in the industry, which is hugely apparent when you see their upcoming subscription strategy. Inside a standard GraphQL query, they mark certain fields with @live, which gives that field realtime super powers. The best of both worlds.

When I first saw that, I had to have my own @live directive. Then, when I saw a teaser for the upcoming Relay 2.0, which looks fantastic and squelches half of my complaints with the current Relay API, I got bummed because it didn’t even mention subscriptions.

After all, the biggest problem with subscriptions isn’t the data transport (GraphQL) it’s how they’re patched together on the client. Chances are, many queries will rely on data coming from a single subscription. That stream of data is going to be sorted, filtered, and map-reduced in a unique way for each query. Maybe subscribing to a team also triggers a subscribe to each team member, bringing back the n+1 problem that GraphQL so eloquently laid to rest. Some of that subscription data might even overwrite queried data, meaning we’ll need to normalize it and invalidate the queries that depended on it. Other data might be ephemeral, such as a connection’s socketId and the userId attached to it. Combining socket state with DB data isn’t some esoteric exercise in pedagogy; it’s as simple as the green “online” badge next to your friend’s face.

The Architecture

So you know subscriptions are gonna cause some headaches, but decided that a realtime app is business critical. Good on ya. So how do you start?

The solution I have in production today looks something like this:

Truly, this isn’t too different from your basic modern web app. Spin up a websocket server, back it with a message queue, and call it a day. The websocket server should handle auth (like a JWT) so validating permissions is synchronous and fast. SocketCluster even offers a built-in message queue that manages subscriptions and scaling for you. This is great for ephemeral data, since I can publish an “I’m online!message to my team channel directly from my GraphQL mutation. I could mitigate GraphQL all together here and go straight to the message queue, but having a schema for ephemeral data is well worth the cost & there are smarter ways to optimize.

Next, we need changes in the database to populate the message queue. When a successful write occurs in the database, we take the result, run it through some logic to determine which channels to notify, and transform the document to match what the channel expects. Or, we could just use RethinkDB, which does this internally and much more efficiently than I ever could in JavaScript. The end result is a socket connection subscribing to a RethinkDB changefeed, transforming the doc to match my protocol (roughly based off of DDP), and sending it down to the client. Sure, I’ll have to switch to rabbitMQ eventually, but not before I pop up a server in Europe or reach an estimated 200,000 concurrent connections. By that time, I’ll hire someone else to do it. From a boat.

The Client Cache

So we’ve got documents efficiently streaming down to the client, but nowhere to put em. CmRDT solutions like swarm.js and scuttlebutt are the fastest, most robust solutions, but they only work for single fields. Relay and Apollo don’t handle subscriptions (yet?). Horizon by RethinkDB does, but doesn’t support GraphQL yet (and who knows to what extent they will). So what’s a guy to do?

A few months ago, I introduced a package called Cashay as an alternative to Relay. It uses Redux as a store and offers a simpler API with improved performance. I decided to make my own @live decorator for it. By marking a field as live, it doesn’t query the server, it subscribes to a channel. By nesting subscriptions, Cashay leverages the beauty of GraphQL by delivering a document to your view layer in exactly the format you want it. It is immediately clear how your data will look, even though it is arriving in pieces. Even better, there is no need to send a path like [‘getTeamById’, ‘teamMembers’, 3] from the server. The type information is derived from your GraphQL schema. The logic is built in.

query {
getTeamById(id: $id) {
id
name
teamMembers @live {
id
preferredName
connections @cached(type: "[Connection]") {
socketId
}
projects @live {
id
content
updatedAt
}
}
}
}
// options
{
resolveCached: {
connections: source => doc => source.id === doc.teamMemberId
},
sort: {
projects: (a, b) => a.updatedAt > b.updatedAt
}
}

By nesting the projects subscription, we can achieve n+1 subscriptions declaratively. That encourages more channels, which is ideal. Channels are cheap; payloads aren’t. Since Cashay caches these denormalized responses, they are only invalidated when a pertinent subscription receives a new doc. By passing in a sort (or filter) function, the transformation occurs before caching, which means a single subscription can serve many queries, and there is no penalty when a component re-renders. And that @cached you see? That isn’t even a part of the subscribed doc. It’s looking through all the active connections and cherry picking the ones associated with that user (1 user might be logged in from many tabs/browsers/computers/mobile devices. If that’s how you party, I ain’t gonna judge).

This stuff is fine for the basics, but what about some crazy amalgam of map-reduce logic that mixes local state into domain state? Write a function, memoize it, and plop it in your mapStateToProps just like you would plain old redux state. Because that’s what it is.

And the stream of documents that you received? It gets normalized and merged into your previously queried and mutated data. After all, your view layer doesn’t care how the data got there, so why should subscription data be stored separately? This opens up a whole new world where queries and subscriptions can live in harmony. Offline.

Offline-first Subscriptions

Until now, subscriptions were regarded as valid until the connection was lost. If a client loses internet for a minute, you better forget all of your team members and start a new subscription to get them again. After all, who knows what occurred? A smarter pattern might be what I’ll call a subscription cursor. If the client received the last team member at 4:01pm and then lost connectivity for 5 minutes, she can start a new subscription for all team members where updatedAt > 4:01pm. In doing so, you could persist subscription data in your localStorage, and pick up right where you left off 2 days later. Even better, the user has something to look at before their queries even leave the client.

The Future

This is only the beginning. At our little company Parabol, we use the latest tech stacks, and occasionally make a tool or 2 to get help us along the way. If this stuff interests you, and you like the idea of earning equity in a young startup in your spare time, check out our Equity for Effort program. Keep your day job. Stay in school. You make PRs on our GitHub, you own a piece of the company. That simple. We’re a 1-year-old startup focused on breaking folks free from their 9–5 cubical prison and we like playing with future business models even more than future web stacks (our lawyers hate us). Something more formal? How bout an internship: 20 hours/week, get paid cash/equity, and work from home (we live in Brooklyn, Dallas, and San Diego). Save the resume hogwash, show us the coolest thing you’ve made on GitHub. Here’s ours. How’s that for a job posting disguised as a tech article?

Thanks to Parabol, Inc., Jordan Husney, and Terry Acker.

Matt Krick

Written by