GraphQL at Redbox

Part II — Connecting/Migrating Backend Services

Marko Lazić
Redbox Tech Blog
12 min readJun 14, 2021

--

A machine with many wires crossed and connected; similar to a telephone operator
Photo by John Barkiple on Unsplash

Redbox, like many other companies, is working on a digital transformation journey; modernizing and moving legacy systems out of it’s data center and into the cloud. In addition to application modernization and migration, Redbox is continuing to building out and enhance it’s new streaming platform — Redbox On Demand.

Tackling both legacy migrations and new additions/changes at the same time is difficult and sometimes results in a perfect storm chaos. In addition, we had to adapt and ensure our platform was stable and able to accommodate a large increase in traffic due to the pandemic. Aside from all the technical challenges we faced, we (along with the rest of the world) had to fundamentally change how we communicate, work together, and support one another during the global pandemic of 2020. Talk about a stressful migration!

At Redbox, we aim to avoid “lifting-and-shifting” to the cloud; this means taking the time to modernize legacy applications to be “cloud native” wherever possible and where it makes sense. That being said, as other companies undergoing a digital transformation know (or will painfully learn) you can’t simply forklift all code in a single go. Every team and system has their own set of problems and restrictions; resulting in many parallel tracks and conflicting timelines. To make everything come together, we needed to find a solution that enabled us to integrate with both our existing backend services as well as our new APIs.

Enter GraphQL

In a previous article, we talked about why we chose GraphQL. It was seen as an excellent long-term approach for our platform; one that would give our teams far more flexible ways to ask for data as well as give us a more unified/standard approach to security, metadata, and performance to enable different teams to produce different APIs.

As we began implementing GraphQL, we realized more benefits as well! One of the largest benefits of moving to GraphQL was that it enabled a migration path for both APIs and backend services. This enabled us to slowly migrate and unify our APIs irrespective of where the APIs resided; in some cases before teams could modernize their code bases.

It was initially built to replace parts of our On Demand backend APIs exposing several domains such as product search and detail page, overview of owned products, playback streams, and watching progress. As a second phase, we were able to add the ability to exposes additional domains like profiles, loyalty system, and even some aspects of our kiosks!

GraphQL has the ability to have both public and private information. This enabled parts of a schema to publicly offered for querying information like movie metadata or pricing, while other more sensitive parts are restricted to specific profiles/users.

Adopting GraphQL enabled us to query data easily no matter where the it was hosted! We are able to integrate with data served from new Kubernetes clusters in GCP using GKE, servers behind an ELB in AWS, and even legacy applications found in the datacenter! GraphQL can use data storage solutions directly like Google Cloud Firestore and can even use event infrastructure like Google Cloud PubSub.

Backend

In our backend, we leverage ChilliCream’s Hot Chocolate GraphQL server for the .NET Core platform: https://chillicream.com/docs/hotchocolate

Not only has it proven to be stable and extensible but is it well maintained with an excellent community on their Slack! There are other servers for GraphQL for .NET Core (as well as for Java, Python, Go) but we recommend you check this team out. Many aspects of this article are language agnostic and can be applied independently your preferred development language, but our examples will be in .NET Core.

Implementation with Hot Chocolate

Generally, implementation with Hot Chocolate is based on defining the main aspects: schema, models, and resolvers.

Schema is a definition of the outward-facing contract that’s intended for client use written in GraphQL schema spec.

GraphQL Types (not to be mistaken for regular regular model classes) are an alternative definition of the same thing. While in some server technologies it is possible to work without writing graph types (and only have them as an internal runtime representation of the schema), we feel that this has its limitations. In Hot Chocolate, the full power of schema definition is used if defining schema through graph types directly, in code.

Code Elements

GraphQL Types are used to describe the schema in code.

They are special classes inheriting GraphQL base classes and are defined per logical object in the schema such as product, reel, or account. A GraphQL type defines fields as they would be exposed to the clients, with defined names, types, and ways of resolving.

Also, additional restrictions or validations can be stated. Graph type is stated as a wrapper of a model.

In the example below, ReelType is a type wrapping a model Reel. At the top, it defines basic metadata like name and description. Following, we have a sequence of field definitions — all containing the name, the type, and the resolver. Type can be either a simple type or an object. Resolvers are specified in .Field() call. The name of the GraphQL field is not necessarily the same as the model name.

Example of the Graph Type for a Reel

The model for this graph type looks like any other API model, with no specific additions, decorations, or attributes needed.

Model for the Reel type

This is the model for the object being returned when queried for a reel — a response model. Models are also used for requests and input arguments. In the previous graph type, field items are decorated with expected paging data.

Resolvers can be specified in a simple, inline fashion, just pointing to the field of the model (p => p.Id). Alternatively, they can be specified in a more complex way: delegating the job to a resolver method and class ( p => p.GetReelItems(…) ).

If a resolver is implemented as a separate method, in a separate class one can utilize dependency injection for inserting a parent model, repositories, HTTP clients, configuration or sub-resolvers.

A resolver method, in the case of reel items, can accept a parent model object that can be used for linking or logic based on some fields. It needs to return a proper type — the model for the field, as a single or IEnumerable. The body of this method is where the code would be for calling any external services, database, or similar.

Resolver Implementation for the Reel

Loaders are a way of optimization of resolving a large graph of children objects.

Since GraphQL is very flexible in the way the consumer may request the data, this introduces a “n+1 problem” into the GraphQL server which impacts the performance. It is a known problem with REST services where a client needs to call an API to get a parent and then call an API n times to get children, one by one.

Loaders can solve these problems by reducing the total number of calls. If the results of the query are similar to the below, the execution could benefit from using a loader for children objects, in a result structure similar to the one in the next picture:

An example of a complex graph of objects that loaders can get efficiently

It enables the execution to get all the objects of the same level in a hierarchy of the results in one go. This could use the batch call toward a dependency REST service, a multi-row result from a database, or a batch call to Redis. It is a construct of Hot Chocolate GraphQL where, when given a special delegate, it will make sure that for multiple parents it will be called only once, once ids are gathered for all the children.

Basically, the batch DataLoader collects requests for entities per hierarchy (processing) level and sends them as a batch request to the data source. Moreover, the DataLoader caches the retrieved entries within the scope of the request. Part of the code that utilizes this is shown below.

Example of the loader implementation for reel items

Loaders enable us to issue a batch call to the dependency we’re getting children from. This means that the need for batch access propagates to the dependency. If it’s a REST service, it would need to have a separate batch-enabled API for this — one that accepts a list of ids and returns a list of objects. This type of contract is usually a violation of the REST conventions, as it does not fit with the central REST idea of a resource. If it is implemented as a GET call, it would need a way to accept a list of ids in the URL, as a combined key in the route or in query parameters. If the dependency is a database, it would need to be able to return a batch of child elements efficiently in a single call.

In GraphQL backend, we utilize this for most types of queries including products, pricing, reels, stores, etc.

When implementing this, one has to be aware of some possible pitfalls. Dependency has to be able to return a batch of objects in a sensible time, otherwise, these calls are strong candidates for timeouts. This means that the internal implementation needs to be such that the batch call saves time, instead of just multiplying the duration.

Another possible problem might be the response size for large batches since the payload transfer could also impact response time. This needs to be watched out for and probably solved by splitting the batches on GraphQL loader side.

Also, when accepting a batch request, a GET API implementation could run into URL length limitations.

Request and Response Structure

Request to a GraphQL server is written in GraphQL specific syntax, somewhat similar to JSON. In general, the request body has a query or mutation body. If it contains references to variables, they need to be stated as a separate section. In the example below, $id is a reference for a variable and its value is specified separately. It could have been specified inline too, but it is not considered a good practice.

On the client-side, a query text can be a static resource, without the need to do string replacement and worry about the format of the parameters. In the backend, it enables GraphQL to understand that it is the same query, although different values can be sent, and do some optimizations base on it. It also enables metrics to be emitted in a proper way.

Headers are not part of the GraphQL specifications. The authorization header mentioned earlier is to be sent in the standard HTTP way.

Response is returned as a standard JSON, always with HTTP status 200 OK.

Some parts of the JSON response structure are GraphQL specific. Two main sections are data and errors, where errors are optional. Data is the root for any response returned, independently if it’s a scalar or an object. It contains the same structure as requested in the query.

If there are errors resolving some fields, they will be stated in the errors section. The corresponding fields will be returned as JSON null. It is possible for GraphQL not to able to return only parts of the requested graph, so GraphQL response can contain both a partial response and an error.

Some of the challenges we faced related to this was about the improper use if these sections on client side.

Clients cannot only test if they got a value in data section, errors section must be analyzed too.

Also, authentication and authorization errors that are normally returned as 401 and 403 as an HTTP standard, are returned as errors in the 200 OK response. These are clearly going against HTTP standards, but are a necessary tradeoff with GraphQL design.

Execution strategy

GraphQL executes the query based on an execution plan and it is constructed by evaluating the query. As a consequence, it is always executing a parent’s resolver, before calling resolvers for children, but children can be executed in parallel or sequentially based on resources. Code must be written in a way to not assume order between siblings. A parent object can be injected into child object resolver such as in our example above; where Reel is inserted into the resolver for Product objects.

Security and Compliance

As previously mentioned, parts of the schema can be made public and other parts private; only accessible with a valid, user-specific access token. For example, generic data such as products, prices, and reels are all sets of data that can be accessed even if the end-user is not logged in. However, data like user transactions, purchased titles, or profile settings only make sense in the context of a user and thus are specific to every user.

GraphQL uses a standard Authorization header with a Bearer token value. GraphQL uses authorization constructs on specific fields and is able to determine if the request can be resolved. It is possible that a client could query for parts of a schema which are public together with private user specific data. In this case, a client can get a partial response back — with public data returned and without private data until the user authenticates and authorizes. Until then, there will be a set of authorization errors stating which fields produced errors client-side.

With various compliance and regulatory requirements implementing a common GraphQL can be a challenge. As a result, these are things you need to plan and consider for when creating your own GraphQL implementation!

Architecture, Design, and Deployment

Hot Chocolate’s GraphQL over HTTP implementation is an overlay over a standard ASP.NET Core application. It is added as a middleware into the ASP.NET Core middleware pipeline.

GraphQL is a service that can be seen “monolithic” in nature; as a direct consequence of its aggregation nature. It covers all domains that GraphQL exposes and as a result it can be resource-intensive at times. In that sense, it is natural to have all graph types, models, resolvers, and loaders in one application. That being said, as your platform grows and scales it can be broken down into separate libraries.

As mentioned, GraphQL can be resource-intensive and as a result needs to scale. To effectively scale GraphQL, it needs to be stateless. Any state must is distributed. We leverage Redis for some of our complex caching requirements, and in some cases leverage in-memory caching.

Benefits

With GraphQL in place for both our On Demand and Physical platforms, we have already reaped several of the benefits of GraphQL.

With contracts where clients are able to pick-and-choose mobile, TV clients, and web is executing slightly different requests against the GraphQL and has already changed their requests few times. This makes the development of clients more agile and product and marketing capable of executing fast short experiments with the UX. All that is happening without GraphQL contracts being changed, and all the time with minimal response sizes needed.

With GraphQL in place, the backend is evolving and several internal dependencies have been able to be changed as part of the technological transformation. Some of the examples have been the advancements around personalization; enabling us to break down the service into small microservices focusing on things such as Loyalty and Profiles. This change is a great example of implementation details; all of there changes were executed in a way that did not impact clients.

Come work with us

Migrating and building a cloud native tech stack is tough! We are always looking for more talented engineers to help us build and grow! If you’re interested in having a lot of autonomy and opportunity to solve complex problems at scale, take a gander at our Engineering job postings!

--

--

Marko Lazić
Redbox Tech Blog

I have been curious about how things work all my life. Today, I use the same attitude in software development to make things good and then make them better.