The tradeoffs of a replicated cache

By Simon Fry

The tradeoffs of a replicated cache

At Skyscanner we cache the results when a traveller searches for flight prices. This allows future travellers to get results quicker if they search with a similar query. Caching is a simple and effective solution when the service and the cache are located in the same place. However, when we start to serve requests from multiple data centres, it becomes a bit more complicated.

Skyscanner has a number of data centres across the globe. This allows us to provide a high availability service to our travellers, and to provide as many users as possible with a geographically close instance of our service for a faster experience. We route user requests to the nearest data centre, meaning that each data centre handles different workloads. For instance, a data centre in Europe handles more queries for flights between two European cities than a data centre in Asia. However, a data centre in Asia still handles some of these queries. How should we optimise our cached data to ensure the best experience for all our travellers?

The two simplest strategies are ‘share everything’ or ‘share nothing’.

When sharing everything, each data centre sends any new cache data it gets to all of the other data centres. This minimises the amount of prices we need to retrieve from our partners, and means users everywhere get the fastest possible results. However, our caches are large, and sending large data sets takes time, and can be expensive. If everything we share gets used, then the effort is well spent but if we share cached data which isn’t used, the wasted effort and resources can be significant.

The opposite is sharing nothing. Here, each data centre is a silo, and so if a user searches for a flight, it doesn’t matter if another user did exactly the same thing in another data centre minutes before. A fresh set of prices are found, and are then locally cached.

We were able to analyse how frequently a key generated in one region was also requested in the other regions in the lifetime of that cache entry. The results are shown in the table below.

Ratio of quotes generated in one data centre being used in another data centre over a sample of days
Ratio of quotes generated in one data centre being used in another data centre, condensed into geographic regions, over a sample of days

What this reveals to us is that for data centres which are geographically close, the overlap in the quotes used is high. For the data centres which are geographically distant, the overlap is lower, but non-zero. This led us to implement a compromise strategy between sharing everything and nothing. We want to share cached data if it’s going to be useful, but otherwise minimise the data sent between our caches.

To do this, when a cache stores a new item, it sends a message to all of the other caches saying it has data for a corresponding key, but doesn’t send the data itself. The remote caches store this message, with a TTL, as if it were a normal cache entry. If a traveller in a remote region hits the cache and finds one of these messages, the cache knows it can request the data from the remote cache, and give that to the traveller. This doesn’t sound particularly fast, but it is still faster in most cases than getting the data fresh from an airline. The remote region can then store the data itself, so if a third user requests the same thing, the item is available locally. In the case where the data is never requested, the message just expires, and no extra data transfer is wasted.

By using this strategy we push our cache hit rate from 30–35% with no sharing to 50%, whilst both saving hundreds of GBs a day from being wastefully transferred, and allowing us to have smaller, more fully utilised cache clusters.

SEE the world with us

Many of our employees have had the opportunity to take advantage of our Skyscanner Employee Experience (SEE) — a self-funded, self-organized programme to work up to 30 days during a 24 month period, in some of our 10 global offices. There is also the opportunity to work for 15 days per year from their home country, if an employee is based in an office outside of the country they call home.

Like the sound of this? Look at our current Skyscanner Product Engineering job roles.

Join the team

About the author

My name is Simon Fry, a Software Engineer working in the Hyperdrive Squad in London. We work on Skyscanner’s flight product, making it easier and quicker than ever to find you the right flights and get them booked. Outside of work I love being a tourist in new places, and seeing what new experiences (and foods) the world has to offer.

Remember! Sign up for our Skyscanner Engineering newsletter to hear more about what we’re working on, interesting problems we’re trying to solve and our latest job vacancies.

Simon, exploring Barcelona, in search of tasty paella