Feature Flags in Federated GraphQL

Kenny Mcgarvey
Engineering @Varo

--

Let’s define a couple of things before we jump into the fun stuff. In the world of GraphQL, “federated” means it is a single graph composed of multiple sub-graphs (see Apollo Federation). In other words, it’s “micro-services” for GraphQL, but your consumers don’t have to see the mess. Next, what are feature flags? These are a clever way to turn functionality on and off without re-deploying code. New features can be built, deployed, and released to the public, but not enabled until everything is ready.

At Varo, we use feature flags extensively across our mobile apps and web app to enable fast-paced development and smooth product rollout. We also try to keep business logic out of the front-end, which means GraphQL drives much of the UI behavior. The same feature flag hiding/showing a widget in the app also changes business logic in our GraphQL service. For a consistent user experience, our GraphQL layer needs to match whatever flags the app has enabled. The enabled features can vary across platforms, app versions, and time, so we can’t simply fetch and cache the feature flags when our services start. Let’s explore a couple of ways to integrate feature flags into a federated graph.

A simple approach

Here we have a Federated graph with a single gateway (which composes everything into a single endpoint) and some arbitrary number of subgraph services. Each GraphQL service can ask the feature flag service for the enabled features for a given request. One advantage of this approach is it is simple and easy to implement. Every time you add a new service, simply connect it to the feature flag service. Easy.

Before we dive into the downsides of this simplistic design, it is essential to note how most third-party feature flag services work. Typically they can compute the enabled features locally without a network request. Their SDK will download a config file and keep it up-to-date via polling. While that saves us the overhead of network latency every time we ask what features are allowed, it does mean that these config files could get out of sync across our services. Let’s imagine a GraphQL query that spans more than one of our services. What would happen if two services received different values for the same feature flag? It would likely lead to unexpected behavior and an unpleasant user experience.

On top of that, each GraphQL service depending on this third-party SDK isn’t ideal. Even if we ignore the dependency proliferation, every service still needs to talk to this external API over the internet. Personally, minimizing how exposed our microservices are to the outside world should be a priority.

Gateway to the rescue

Let’s review what is going on here. When the gateway receives a GraphQL query, it fetches the enabled features for that request. Those features are then attached to each of the subgraph queries required to fulfill the original query. A single entry point for the feature flag service SDK and external API — check and check. Feature flags are “locked” within the context of a single GraphQL query — big check. Having logic in our GraphQL gateway isn’t great, but this is a cross-cutting concern, so it feels excusable.

How is this data sent to each of the GraphQL services? One option is to use HTTP headers on every request. For example, “x-feature-flags”: “product_1, product_2”. The recipient GraphQL service can parse them and resolve the query. Using headers works well until your list of feature flags grows or you need to attach variables to each feature. A slightly better alternative is to send it in the HTTP body via extensions. To add a bit of safety, you can define types to help with serialization and deserialization.

Bonus: Let the client do it

You could take the above a step further and pull this concern down the app. The client could then send the feature flags via HTTP headers or in the body extensions. If you are OK with the overhead of each request containing all of the enabled features, this is a perfectly valid approach. One thing to keep in mind is that this contract sits outside of the GraphQL schema, so you will need to stay vigilant that neither the client apps nor GraphQL breaks this agreement between releases.

Ultimately we implemented our feature flag logic at the GraphQL gateway level. It allows us to abstract it from our microservices, but not so much that we depend entirely on the client apps sending the data. Regardless of which approach you use, there’s one point to take away. Utilizing feature flags within GraphQL has given us a significant amount of flexibility and control in developing and releasing new features and enhancements.

--

--