GraphQL Cache Normalization with Cachier

Jonathan Chen
3 min readDec 4, 2022

--

What is Cache Normalization?

Cache normalization is the process of eliminating duplicate data from the cache. This ensures that every piece of data in the cache is unique thus guaranteeing maximum space efficiency of the cache.

How does Cachier Normalize your Cache?

Cachiers Normalized Cache is a middleware function for your Express server that intercepts all GraphQL queries and automatically normalizes them to be stored in the Cache. This can allow for partial and superset retrievals from the cache which greatly reduces network requests to your GraphQL API.

A Deeper Dive Into Cachier’s Normalizing Algorithm

First to set the stage here is an example query made to the SpaceX API:

fetch('/CachierNormalizedCache', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Accept: 'application/json',
},
body: JSON.stringify({
query: `{
dragons {
launch_payload_mass {
kg
lb
}
name
}
company {
ceo
coo
cto
employees
}
roadster {
speed_kph
speed_mph
}
}
`,
uniques: { dragons: 'name' },
}),
})

Notice the uniques option in the body of the request. Sub-queries that are lists will need to contain a unique identifier for its list items. In this example “dragons” is the list and name is a unique identifier for the list items type “dragon”.

When Cachier receives a query it will first check to see if the data is already found in the cache. If it isn’t Cachier will then utilize GraphQL introspection and add the field __typename (if not already queried for) to every subquery. The __typename field is essential for Cachier to generate a unique key in the cache.

Next Cachier parses the incoming query to create a map for the return data to be stored. Here is how the Cachier map looks for the SpaceX query above:

{
"typesArr": [
"dragons",
"company",
"roadster"
],
"fieldsArr": [
[
"__typename",
{
"launch_payload_mass": [
"__typename",
"kg",
"lb"
]
},
"name"
],
[
"__typename",
"ceo",
"coo",
"cto",
"employees"
],
[
"__typename",
"speed_kph",
"speed_mph"
]
]
}

The map contains the following 2 keys with array values: A typesArr containing the types of the main subqueries , and a fieldsArr which contains an array of arrays. The index’s of the typesArr and the fieldsArr correspond to one another, for instance fields[0] contains the fields for typesArr[0]. If a specific field for a query type is a nested query (Cachier recursively checks nestings to account for arbitrarily nested queries) then it will appear as an object containing the nested queries type as the key and its value will be an array of its fields (FieldsArr[1] is an example of this).

Next the map is used to organize the requested data and generate unique keys in the cache.

Here is now the cache looks:

{
"dragons": [
"dragon:Dragon 2",
"dragon:Dragon 1",
5003.373208001256
],
"dragon:Dragon 2": {
"__typename": "Dragon",
"launch_payload_mass": {
"__typename": "Mass",
"kg": 6000,
"lb": 13228
},
"name": "Dragon 2",
"__CachierCacheDate": 5003.355625001714
},
"dragon:Dragon 1": {
"__typename": "Dragon",
"launch_payload_mass": {
"__typename": "Mass",
"kg": 6000,
"lb": 13228
},
"name": "Dragon 1",
"__CachierCacheDate": 5003.371750000864
},
"company": {
"__typename": "Info",
"__CachierCacheDate": 5003.400667000562,
"ceo": "Elon Musk",
"coo": "Gwynne Shotwell",
"cto": "Elon Musk",
"employees": 7000
},
"roadster": {
"__typename": "Roadster",
"__CachierCacheDate": 5003.745292000473,
"speed_kph": 109278.66672000001,
"speed_mph": 67902.59441847312
}
}

(You can ignore the __CachierCacheDate keys and the last index of the array for dragons for now. These values are used for Cachier’s eviction policy which we can talk about in a later article)

As you can see the cache stored the list-item dragons as an array containing references to the list-items “dragon: Dragon 1” and “dragon: Dragon 2” (The unique keys for the list items were generated by combining their __typenames with their unique identifiers which was “name” in this case) by storing lists in this way, the cache is able to preserve the order of data in the lists without storing duplicate data.

Hopefully this article has helped you better understand Cachier’s Normalized Cache and how beneficial normalizing your cache can be for not only GraphQL caching but all types of caching. Normalizing your cache will allow you to handle partial queries, cache updates, and increase storage efficiency.

Support Cachier’s community with feedback, stars, or contributions to the open-source project here.

Our Team

Andy Zheng — LinkedIn | GitHub
Dhruv Thota — LinkedIn | GitHub
Jonathan Chen — LinkedIn | GitHub
Kaju Sarkar — LinkedIn | GitHub
Roman Darker — LinkedIn | GitHub

--

--

Jonathan Chen

Software Engineer that is devoted to the open source community