Our journey with GraphQL at Moonpig — Part 3

Published in

Moonpig Tech Blog

7 min readOct 30, 2020

Why Federation

In this series of blog posts we share Moonpig’s journey to start using GraphQL. In part 1 we talked about why we decided to use GraphQL. Part 2 covered our experiences of the schema stitching paradigm. This part will explain our next steps as we venture into the world of GraphQL ‘Apollo Federation’.

The schema stitching gateway that we built using Apollo Server and graphql-tools was holding up well in production; a significant proportion of our website traffic was being served by our stitched GraphQL API with multiple micrographs and no major concerns. But… a requirement came in from our engineers building the basket API. They wanted to define a type in the schema which has fields resolved by more than one micrograph. With our schema stitching gateway this would not be possible unless we allowed schema-specific resolver logic to be added to the gateway; something that we had decided against to avoid having everyone write and deploy code from the same source repositories. This called for a new solution, maybe it was time for… Apollo Federation!

Apollo Federation, like schema stitching, also uses the concept of merging GraphQL schemas and having microservice APIs resolving sets of types within the composed schema. But Apollo Federation takes this style of GraphQL to the next level by allowing a micrograph to extend a type from another micrograph by adding new fields.

As an example, we might have a product gallery on the website, displaying a list of product entities, each with a review score. The client would request the list of products from the GraphQL API and need to include the review score field for each product. A sample schema for this purpose:

type Product {
  id: Id
  title: String
  reviewScore: Number
}type Query {
  products: [Product]
}

In the back end we could have a micrograph serving the product list (Product API) and another providing the review data (Review API).

With our schema stitched gateway we could only supply this data from the gateway by splitting the type into multiple queries:

type Product {
  id: ID
  title: String
}type ProductReview {
  productId: ID
  reviewScore: Number
}type Query {
  products: [Product]
  productReviews(productIds: [ID]): [ProductReview]
}

The client would need to first send a request to the GraphQL gateway containing the ‘products’ query to get the product listing, then extract the set of product id fields from the response and send another request to the GraphQL gateway for the productReviews query, aggregating the responses client-side to create a full data set of products and review scores.

The Apollo Federation specification provides new schema elements that would allow the Review API to extend the Product API schema, with the federation gateway being able to use the schema elements to generate a query plan to execute queries without needing to add any schema-specific resolver code to the gateway.

The Product API schema now specifies a key field for the product type, which the gateway will use to resolve any of the fields served by other micrographs:

type Product @key(fields: “id”) {
  id: ID
  title: String
}

The Review API can extend the product type with extra fields using the extend keyword:

extend type Product @key(fields: “id”) {
  id: ID @external
  reviewScore: Number
}

In our example scenario the client can now fetch products and review scores from the GraphQL gateway in a single request. The gateway will first request the product list from the Product API, gather the id values and send them on to the review API and combine the data into a single response.

The type extend functionality has simplified our GraphQL schema, client-server communication flow, client-side source code and server-side code, keeping the gateway 100% schema-agnostic. A great win for GraphQL microservices!

Moving to federation

Apollo provides an npm package called Apollo Gateway, an add-on to Apollo Server, which handles the federation schema composition and query plan generation. The simplest usage is to supply Apollo Gateway with a list of services and it will query each one for the schema over HTTP:

const gateway = new ApolloGateway({
  serviceList: [
    { name: ‘accounts’, url: ‘http://localhost:4001' },
    { name: ‘products’, url: ‘http://localhost:4002' },
    { name: ‘reviews’, url: ‘http://localhost:4003' }
  ]
});
const server = new ApolloServer({ gateway });

Apollo also provides a feature called ‘Managed Federation’, which uses Apollo Studio to manage the micrograph schema. The micrographs ‘push’ their schema to the registry at deployment time using Apollo’s nodejs CLI and the Apollo Gateway fetches from Apollo’s Google Storage at runtime using a supplied API key. The Apollo platform supports different variants of the schema (e.g. staging and production), created by including a custom variant name in the push request along with a URL that specifies where the federated service can receive requests at runtime. Each combination of federated schema, federated service name and URL is known as a ‘service definition’ and these service definitions can be found in the form of JSON files at different locations in Google Storage.

Because we are building our server side processes on AWS lambda, we decided that we did not want the gateway to fetch the schema from anywhere at runtime because this would introduce latency into requests and resiliency risks. We did have runtime schema introspection in the stitching gateway, but with Apollo Federation we have less control of the runtime schema management process due to increased encapsulation and when different schemas can depend on each other, a failure to fetch one schema can invalidate another. Instead we designed a solution that polls the service definition files in Apollo’s Google Storage on a regular interval to fetch the federated service definitions for each environment, which we package up with the gateway lambda at deployment time. When the schema has changed we trigger our build and deployment pipeline to rebuild the GraphQL gateway lambda deployment package with the new schema definitions bundled inside and run a Terraform deployment to apply changes to the lambda function. The federation gateway lambda then reads the schema definitions from a local JSON file on startup, uses the core ‘graphql’ package to parse them into GraphQL objects and supplies them to the Apollo Gateway instance using the localServiceList configuration:

const gateway = new ApolloGateway({
  localServiceList: serviceDefinitions.map((sd) => {
    return {
      name: sd.name,
      url: sd.url,
      typeDefs: parse(sd.sdl),
    };
  }),
});

Custom transport

Apollo Gateway does not allow us to use the same stack of ‘link’ components for communicating with the micrographs that we used in our schema stitching gateway. For adding custom behaviour to the outgoing requests we were able to build a custom RemoteGraphQLDataSource implementation, which has useful request lifecycle hooks.

The willSendRequest hook lets you modify your gateway’s requests to the implementing service before they’re sent.
The didReceiveResponse hook lets you modify the implementing service’s responses before the gateway passes them along to the requesting client.

We used the willSendRequest hook to customise the outgoing HTTP requests with additional headers and an HTTP request timeout value. The didReceiveResponse hook allowed us to capture the micrograph response cache control headers like we did using a custom link in the schema stitching gateway.

Apollo Gateway’s constructor has a buildService function input, which we can use to supply our custom data source for each service when the gateway first loads.

const gateway = new ApolloGateway({
  …
  buildService: (service) =>
    new CustomRemoteGraphQLDataSource(service);
});

The release

After some functional testing to ensure that we have not broken the GraphQL API and some performance testing comparisons with the stitching gateway, we released the new federation gateway to our US region by replacing the stitched gateway lambda function with the new federation one. Our monitoring capabilities allow us to observe the impact of the new gateway on a variety of metrics. After observing over the weekend we realised that a caching layer in place between the GraphQL and a micrograph was no longer functioning correctly as the cache hit rate had dropped to zero.

We rolled back to the stitched gateway to avoid being overwhelmed by a sudden large increase in requests to the micrograph.

A subsequent investigation uncovered a difference in Apollo’s usage of HTTP for Automatic Persisted Queries. Our schema stitching gateway had used apollo-link-persisted-queries to handle server-side APQ requests, which supported using the HTTP GET method. The APQ client functionality was now built-in with Apollo Gateway and the GET verb no longer supported. We created an issue and pull request on Apollo’s GitHub repository (#183 Persisted queries no longer compatible with CDN) for the GET APQ feature be added, while in the meantime making a quick fix in our own solution.

A second release of the federation gateway saw the caching issue resolved and the new federation gateway complete, ready for our engineers to make use of those great Apollo Federation features.

Our journey with GraphQL at Moonpig — Part 3

Why Federation

Moving to federation

Custom transport

Written by Glen Thomas