Federated GraphQL @ Walmart

Published in

Walmart Global Tech Blog

10 min readDec 31, 2020

This post talks about our motivation (Walmart Customer Experience) to adopt Federated GraphQL and some of the use cases & patterns that we saw with migrating REST based Orchestrators to Federated GraphQL. This assumes you are familiar with GraphQL semantics in general, but might be new to the Federation Specification.

Introduction

Federated GraphQL is a mechanism that facilitates distributed GraphQL Services resolving Graph Nodes for a single Graph. It has a Central Gateway Node that analyses the GQL query & determines which GQL services satisfy the query. Clients send the query to the Gateway which follows scatter gather style pattern to orchestrate to service providers registered with it. At Walmart we went with the Apollo Framework for Federation GraphQL Gateway & Federated GraphQL services. The services are built using apollo-server apollo-gateway and fastify

Motivation

Domain Model Consistency

Our customer journey consists of pages that are powered by BFF (Backend for Frontend) orchestrators. The journey is categorised into multiple pages, with each page having its own orchestrator to service its client’s needs. These are REST based Orchestrators written in either Java or NodeJS. Most Orchestrator engines that came about were either dictated by Org structures or by Client teams needing custom view models. For instance the product information on a POV Carousel would be provided by the Personalisation Service talking to the Product Service. The search results on the search page is powered by another orchestrator that also talks to the Product service. There was no enforcement of common shared data models resulting in bespoke implementation for a product tile on each page. on the client.

Developer Ergonomics

We wanted Developers to have a consistent model to work with that is always upto date and easily available. In GraphQL, the schema with documented fields & types is the de-facto API contract. Additionally the GraphQL Playground allows developers to try out various queries before integration. Developing horizontal concerns such as adding item to wishlist today means touching various services that orchestrate with the Product Service to also add the favourited item into their model. With Federated GraphQL, the List team can extend the product schema and add the isFavourite functionality to the product schema thereby reducing the coordination required among teams.

Performance

Performance is first class concern for the Front End applications and the Orchestrators that power the Front end. With the Federation Model the Gateway is in the same WAN as the services so the network overhead with the gateway spawning off calls to services is between 20–40ms. It does not do any computationally heavy task and is mostly heavy on I/O. Additionally the gateway does its own network path optimisations by minimising the number of calls to a service P. So if N fields of the graph are satisfied by lets say the Product Service, there is only one call that is done to the Product Service. This works well in most cases, barring a special use case which is mentioned later in this post.

Measuring Success against Goals

Given that we were moving to a completely new platform, it was important to ensure that the decisions we made were measurable with data and we could improve it going forward. Some of the measures of Success we put in were the following

Client Payload size reduction — we found around 60% improvement in payload sizes for the same functionality
Number overlapping calls to the same domain service — 100% improvement as all calls went via a single service or the gateway
Reusability of domain entities from FE & BE — 100% schemas were extended for additional functionality. Additionally GQL ensures that there is only one resolver a type.

Learnings with Federated Graphs

While GraphQL solves a lot of our existing pain points , Federated GQL is not a silver bullet that solves the distributed graph sharing. There were use cases that did not quite fit the GraphQL pattern and still needed additional backend orchestration patterns. We focused first with GQL on the edge moving down to the Domain Services to bring in under federation if it made sense to do so. So here were some of the use cases that did not quite fit the bill.

Shared Entities

Shared entities are models that are cross cutting across multiple domains. An example of this is View Modules that is shared across the Home Page , Search Page , Item page & so on. A Module is a building block of the view and can be semantically equivalent to a div or a ViewModel. Views are built from Modules that are configured in our Content Management System. A Layout drives what Modules go to which portions of the ViewPort. In the case Layouts & Modules, we had a number of domain driven backend systems that wanted to drive the final Layout & Modules on a client page. For instance the Modules on the Search Page are driven by ML Models & Query context. The Home Page view is driven by the Personalisation Backend that decides which Modules are relevant based on the Customer Context. Both these backends want a final say on what Layout & Modules should be sent to requesting client. The client request looks something like this for Home Page & Search Page

//Types in ViewModule Service
interface ModuleConfigs {
  configs: JSON
}
type ViewModule {
  name: String!
  type: String!
  version: Int!
  configs: ModuleConfigs!}type P13NModuleConfig implements ModuleConfigs {
 recommendations: [ProductInfo]
 metaData:MetaData
}
type BannerModuleConfig implements ModuleConfig {
  bannerImgUrl:String
  bannerText:String
}
type ContentLayout {
 layout: JSON
 modules: [ViewModule]
}
extend type query {
 contentLayout:ContentLayout
}//Queryquery HomeLayout {
contentLayout {
   layoutJSON,
   modules {
     moduleName
     ...on BannerModuleConfig {
       bannerImageUrl
     }
     ...on P13NModuleConfig {
        product {
          productId
          description
          url
        }
     }
   }
}//Search ViewModel Types

type GuidedNavConfig implements ModuleConfig {
  config: JSON
  guidedNavigation: [NavigationItems] //comes from Search Response 
}//SearchView Query
searchView {
   layoutJSON,
   modules {
     moduleName
     ...on GuidedNavConfig  {
       guidedNavigation {
          navigationItems {
              itemName
              link
          }
       }
     }
   }
}

A straight forward GQL schema would first get the modules and then enrich the modules from the federated services. This approach however does not work in the case of search because the modules and search results are closely tied together and the querying for the search results is the long pole of the orchestration and we did not want any added latency in first fetching the modules and subsequently enrichment with search results.

Our initial version of Federation, resulted in separation based on Domains. So ViewModels for Search was driven from Search Service (whose primary responsibility is to serve search results given query). In Search, the clients need both the Modules on the page as well as the results of the search query. There are a number of modules on the Search Page that are dependent on the Search Results (Such as facets or filters, or guided navigation). We needed to be able to orchestrate for search view in CMS and search results in parallel and then stitch together the view from the results. This requirement created the following issues

Backend entry to the ViewModel service using schema stitching techniques such as delegateToSchema
Different queries from client for a Search View vs any other view to facilitate the request going to search

To over come these and make the code more maintainable, we made the ViewModel Service as a mini Orchestrator talking to the Personalisation Service, Content Management System AND the Search Backend . The next section talks about why this pattern (different domains controlling the aggregate result) is a hard proposition to model in GQL.

Federation Directives `@extends` & `@requires`

@extends

Before we dive in, we will touch upon what this directive does. @extends is a GraphQL directive specified at the type level that indicates that the entity is a remote entity that is being referenced in the current service. A remote entity , if we talk in equivalent Rest terms, is a an entity that you can fetch from a service. So in that sense , the entity also needs to specify the contract of HOW to fetch it. This is where we specify the key fields on the remote entity that dictate the contract with the remote service for fetching that entity. So Module entity that is extended from the ViewModule service can look like this. To fetch the Module, one would need to provide the moduleId as marked by @external

extend type ViewModule @key(fields:"moduleId") {
  moduleId: String! @external }

There are two reasons why a dependency to this module is specified

When you need to decorate the module to add functionality to it
When you use an remote entity in as a composed field in a type

Decorating an extended entity

extend type Product @key(fields: "productId") {
  productId: String @external //populated by the Product Service
  canAddToCart: boolean // decorated by CartService on Product entity
}

Extended entity as input for another field

extend type Product @key(fields: "productId") {
  productId:String  @external //indicates that the service extending this entity, will provide the productId to resolve a Product using __resolveType function
}
type Cart {
  items:[Product]
}

@requires

@requires is used to specify fields needed internally for business logic. The required field might not be queried for by the client, but is needed by the current service to resolve the field on which it is defined. For example, if we need to tag a specific type of modules as top modules, then topModules is a field that you can add to the ViewModel to implement this functionality where you have the @require instruction to get all the modules before filtering. Clients will only ask for the topModules and not modules . Also modules would not be part of the Remote Entity’s key but it is there in its fields.

extend type View @key(fields: "pageId") {
 pageId : String @external
 modules: [Module] //needed for field below
 topModules: [Module] @requires (fields:"modules { moduleId description config { configJSON }") 
}

If Module is a complex field composed of other types, then the requires directive will need to de-structure the complex object to specify the exact fields needed. This can go out of hand very quickly with the amount of boiler plate extensions that will need to be done.

Unions & Interfaces

Unions cannot be extended from a remote service. They can however be packaged and shared across different services that will need the same signature.

Query Plan execution

The Gateway analyses the queries and then prepares the orchestration path. For this it analyses dependencies between the services as specified by @extends and @requires . All fetch calls to the dependencies are then “hoisted” before the calling the service. While this is the most obvious & best approach to ensuring services that have dependencies on other parts of the graph get those resolved before the service is called, where it can get tricky is when for performance reasons (say service SLAs are high) you want to start the call to your service resolution BEFORE needing the dependencies, so that they are executed in parallel.

To illustrate, a Product has a fulfilment attribute which comes the Fulfilment Service and takes 400ms to get the response. We have a reviews attribute which extends Product Schema and is populated by the reviews service. To do this it needs the productId from the Product Service. Because the Parent Resolver and Fulfilment resolver are both in the Product Service, and the Query Plan optimises on service hops, so the Gateway in this case will wait all of attributes from Product Service before making the call to Reviews Service. So while reviews does not need fulfilment info, it has to wait for the long tail of Product resolution before the call to reviews can happen.

//Review Service
extend type Reviews @key(fields: "productId") {
 productId: String @external
 reviews: [Review] 
}//Product Service
type Product @key(fields: "productId")  {
    productId:String
 productId: String 
 url: String
 fulfilmentInfo:FulfilmentInfo
}

The above indicates that having too many dependencies for a schema increases the boiler plate that one has to write in federated services and the dependencies affects the query execution plan — hence the hard problem of modelling a performance optimised view model for the Search View

Patterns of Service Integration

To be able to quickly identify the pattern of GQL that was being used by various services, we found it useful to associate a name to these various patterns that we have listed here.

Atomic Entity

Simplest integration where a service does not have any other dependencies with other parts of the graph.

Mini Orchestrator

Here the GQL service orchestrates directly with other services due to the fact that it has a complex upstream dependencies for resolving its schema.

Aggregated Entity

Aggregated entities reference remote entities and aggregate it their types , they need to do this because they either need to pre-process before resolving to the remote entities.

Extended Entity

An extended entity decorates a remote type by adding fields to it.

Conclusion

Federated GraphQL works well for separation of concerns and autonomy of teams. However some use cases have to be modelled keeping in mind the above considerations such as Shared Entities, dependencies between nodes of the graph which can lead to inefficiencies both from a performance point of view as well as schema maintenance point of view.

Thanks to Naga Malepati for contributing and reviewing this article.