Working with a cache: problems and solutions

Published in

Bumble Tech

15 min readMar 20, 2019

Hello all! My name is Victor Pryazhnikov and I’m part of the server team at Badoo. Our team is developing and supporting internal API for our customers server-side, caching data is something we deal with every day.

Here’s an interesting opinion when it comes to programming:

“There are only two hard things in Computer Science: cache invalidation and naming things.”
— Phil Karlton

I won’t contest that invalidation is complicated but, it seems to me, caching is pretty tricky on its own, even aside from the matter of invalidation. There are lots of things to consider before starting to use a cache. In this article I am going to explore some of the problems which can be encountered when working with a cache in a large-scale system.

I’ll cover the problems can arise when sharing cached data between multiple servers, with parallel data updates, ‘cold start’ and working with a system when downtime occurs. I will also describe possible ways of resolving these problems and provide links to materials where these subjects are covered in more detail. I am not going to tell you what a cache is in principle, nor will I go into details on implementing specific systems.

For the purposes of this article, I am assuming that the system under consideration consists of an application, a database and a cache for data. Any other source may be used instead of a database (for example, a microservice or external API).

Splitting cached data between several servers

If you want to use caching on a really large system you should use distributed cache and work to share cached data between available servers. This is essential for several reasons:

There amount of data may be very large and the memory of a single server may not be adequate;
Data may be requested very often, and a single server doesn’t have enough capacity to process all these requests;
Your objective is to make caching more reliable. If you only have the only caching server, then, if this server crashes the whole system will be left without a cache, sharply increasing demand on the database.

The most obvious way to split data effectively is to calculate the server number pseudo-randomly, based on the caching key.

There are various algorithms for achieving this. The simplest one involves calculating the server number by finding the remainder from an integer division: dividing the numeric representation of the key (for example, CRC32) by the number of caching servers:

$server_index = crc32($cache_key) % count($servers_list);

This algorithm is called modulo operator. CRC32 is used here as an example. You can take any other hashing function, resulting in a number greater than or equal to the number of servers, with the results, closed to, forming a uniform distribution.

This method is easy both to understand and implement. It distributes data between servers pretty evenly, but it has one serious drawback: when the number of servers changes (due to technical problems or when new ones are added) a significant part of the cache is lost, since, in respect of the keys, the remainder obtained from the division changes.

Below is a short script I have created to illustrate this.

Here a million unique keys are generated, spread across five servers using CRC32 hash and modulo operation. I simulate a situation where one of the servers crashes and data is redistributed among the four remaining servers.

As a consequence of this ‘downtime’ about 80% of keys change their location and thus become inaccessible and will give cache miss:

Total keys count: 1,000,000
Shards count range: 4, 5

Worse, though, is the fact that 80% is not the upper limit. As the number of servers increases, the percentage of cache losses will continue to grow. The only exception is the case of fold changes (from two to four, from nine to three etc.). In this case, losses will be lower than usual, but still no less than 50% of the cached keys:

I have uploaded to GitHub an emulation script, data files and Jupyter notebook which generated this heatmap.

Consistent hashing is a data split algorithm that doesn’t have this problem. The main idea of this algorithm is very simple: there is additional mapping of keys to slots, the number of slots significantly exceeding the number of servers (there may be thousands of them or even more). In turn, the slots themselves are distributed between the servers in some way.

When the number of servers changes, the number of slots does not change, but the distribution of the slots between the servers does:

If one of the servers crashes, all the slots which related to that server are distributed among the remaining servers;
If a new server is added, some of the slots from existing servers are transferred to the new server.

The idea of consistent hashing is conventionally visualised in terms of rings with dots around the rim: the dots represent slots or boundaries between ranges of slots (if there are a very large number of slots). Here is a simple example of redistribution in a case where a situation where the number of slots is not large (60), the slots initially being distributed across four servers:

The picture representing initial distribution shows all the slots pertaining to a given server arranged together. In reality, however, this is not a necessary condition; they can be arranged in all sorts of ways.

The main advantage of this approach as compared with the previous one, is that in this case it is not one value but rather a whole range of values that correspond to each server, and when the number of servers changes, a much smaller proportion of the keys is redistributed between them (k / N, where k is the overall number of keys, and N is the number of servers).

Returning to the scenario I used to demonstrate the drawback with modulo operation, in the same situation — where one of five servers (of the same category) crashes, and keys from the crashed server are redistributed among the remaining ones — we don’t lose 80% of the cache, but only 20%. If we consider that all the data was initially in the cache and all the data will be requested, then this difference means that, in the case of consistent hashing, we will see four times fewer requests to the database.

The code implementing this algorithm will be more complicated than the code for the previous one, and it’s not included in this article. But if you are interested there are loads of implementations in all different languages on GitHub.

Along with consistent hashing there are also other ways of resolving this problem (for example, rendezvous hashing), but they are far less used.

Regardless of which algorithm is chosen, a choice of server based on the key hash may not work well. The data set contained in a cache isn’t usually homogenous but rather comprises a large quantity of diverse data: the space that cached values occupy in the memory varies, the frequency with which they are requested varies as well, and they have varying generation times, updating frequencies and lifetimes. When using hashing you cannot control where a key will be stored. As a result, there may be an excess in terms of both the volume of cached data and the number of requests to that data. This results in a completely different load on different caching servers.

To remedy this, it is necessary to ‘smear’ keys in such a way that non-homogenous data is distributed more or less evenly across the servers. To this end you should not use a cache key to select a server, but rather some other parameter. One of the approaches described will need to be used for this. It is not possible to identify here what the parameter in question should be, since this depends on your data model.

In our case, almost all the cached data relates to a given user, so we use User ID as a parameter for sharding data in the cache. As a result, we have been able to distribute data more or less evenly. Besides this, there is an additional benefit we get from this here at Badoo: the ability to use multi_get to load several different keys at once with information on the user (we use this for pre-loading frequently-used data relating to the current user). If the location of each key were determined dynamically, it wouldn’t be possible to use multi_get because you have no guarantee that all the requested keys relate to a single server.

Parallel requests to update data

Have a look at this section of code:

public function getContactsCountCached(int $user_id) : ?int
{
    $contacts_count = \Contacts\Cache::getContactsCount($user_id);
    if ($contacts_count !== false) {
        return $contacts_count;
    }    $contacts_count = $this->getContactsCount($user_id);
    if (is_null($contacts_count)) {
        return null;
    }    \Contacts\Cache::setContactsCount($user_id, $contacts_count);
    return $contacts_count;
}

What happens if the data requested is absent? Judging from the code, a mechanism should launch to access the missing data. Providing the code executes via a single thread, then everything will be fine. The data will be loaded, placed in the cache, and, when the next request arrives, the cached data will be used from there. However, if several parallel threads are involved, things will turn out altogether differently: the data won’t just load once, but several times.

This is what it will look like, approximately:

When the request starts being processed in process 2 there isn’t yet any data in the cache, however the data is already being read from the database in process 1. In this example, such concurrent loading doesn’t represent such a major problem, because there are only two requests, but the number of requests could, of course, be far higher.

The number of parallel loads depends on the number of parallel users and the time required to load the same data.

Let’s say that you have some functionality which uses the cache at a rate of 200 requests per second. If it takes 50 ms to load the data, then every second you will obtain 50 / (1000 / 200) = 10 requests.

In other words, in the absence of a cache, one process will begin loading data and during the course of loading another nine requests will come in. These nine incoming requests won’t see data in the cache and will also start to load data.

This problem is referred to as a ‘hit-miss storm’, ‘dog-pile effect’ or a ‘cache stampede’. There are several ways of resolving this:

Blocking the re-computation/load data operation before it starts

The idea here is that when data is absent from the cache, you block the process that wants to load it and effectively also prevent other processes running in parallel from doing the same thing. In the case of memcached the simplest means of blocking is to add a key to the same caching server where the cache data is to be stored.

In this case the data gets updated in one process only, but you do need to decide what to do with the processes that encounter a missing cache but which couldn’t be locked to prevent data loading. These might return an error or a default value or perhaps wait for a certain time, before attempting to obtain the data again.

It’s also important to set the length of the blocking time carefully. Naturally, it needs to be long enough for the data to be loaded from the source and saved to the cache. If it’s not long enough, another parallel process may initiate a second data load. On the other hand, if the time is too long — and the blocked process ‘dies’ without having written data to the cache and without itself having been removed — other processes will likewise not be able to obtain this data until the blocking period is over.

Moving updates to background

The basic idea behind this approach is to share data according to the different processes for reading data from the cache and for recording data to the cache. Online processes only involve the reading of data from the cache, not loading it — which only occurs as a separate background process. The present option makes parallel data updates impossible.

This approach requires additional ‘overheads’. These are needed to create and monitor a separate script for writing data to the cache, and for synchronising the lifetime of the recorded cache and setting the timing of the next launch of the script updating the data.

We use this approach at Badoo, for example, for the counter that records the overall number of users — we will return to this example later.

Probabilistic updating methods

The main idea here is that data in the cache gets updated not only when data is absent from the cache, but also when there is a certain level of probability that data is present in the cache. This allows us to update the data before the cached data ‘expires’ and gets requested simultaneously by all processes.

For this mechanism to work properly, the probability of re-computation needs to be low at the start of the cached data’s lifetime, but then increase gradually. This can be achieved using the Xfetch algorithm, which uses exponential distribution. Its implementation looks something like this:

function xFetch($key, $ttl, $beta = 1)
{
    [$value, $delta, $expiry] = cacheRead($key);
    if (!$value || (time() − $delta * $beta * log(rand())) > $expiry) {
        $start  = time();
        $value  = recomputeValue($key);
        $delta  = time() – $start;
        $expiry = time() + $ttl;
        cacheWrite(key, [$value, $delta, $expiry], $ttl);
    }    return $value;
}

In this example $ttl is the lifetime of the value in the cache; $delta is the time required to generate a value to be cached; $expiry is the time up until which the value in the cache will remain valid; and $beta is the parameter for setting the algorithm. By changing the latter, you can influence the probability of re-computation (the bigger the parameter the more probable re-computation in the case of each request). You can read a detailed description of this algorithm in the white paper, “Optimal Probabilistic Cache Stampede Prevention” — a link to this paper is provided at the end of this section.

You need to bear in mind that by using these kind of probabilistic techniques you are not eliminating the possibility of parallel updates, but simply making them less likely. To eliminate them completely, several techniques need to be used together (for example, blocking prior to updating).

‘Cold starts’ and ‘warming up’ the cache

It should be noted that the problem of mass data updating due the data being absent from the cache may be caused not only by a large number of updates of one particular keys, but also by a large number of simultaneous updates of different keys. For example, this can occur when you roll out a new ‘popular’ feature which uses caching and a fixed cache lifetime.

In such cases, immediately following the roll out of a new functionality, data may start loading (first sign of a problem), after which it ends up in the cache — and for some time all will be fine. However, once the cache lifetime expires, all the data will start loading again and this will create heightened demands on the database.

While isn’t an issue you can eliminate altogether, you can ‘smear’ data loading over time. This eliminates the occurrence of a very high number of parallel requests to the database. This can be achieved in several ways:

‘Switching on’ new functionality gradually. This would require a special mechanism. By far the simplest option when rolling out new functionality is to switch it on for a small number of users only, initially, and gradually increase it. In this scenario you avoid an immediate surge in updates and, as the number of users increases, the cache will be ‘warmed up’.
Setting different lifetimes for different elements in the data. This mechanism can only be used if the system is capable of coping with the spike that will occurs with full functionality roll-out. Its distinctive feature is that, when recording the data to the cache, each element is allocated its individual lifetime, and, because of this, the wave of updates evens out much faster as the subsequent updates are spread out over time. The simplest way of implementing this mechanism is to multiply the lifetime of the cache by a random coefficient:

public function getNewSnapshotTTL()
{
    $random_factor = rand(950, 1050) / 1000;
    return intval($this->getSnapshotTTL() * $random_factor);
}

If for some reason you don’t want to use a random number, this can be replaced by a pseudo-random value, obtained using the hash function based on a particular data type (for example, User ID).

Example

I’ve written a short script which emulates the situation with a cache which hasn’t been ‘warmed up’.

In this script I recreate a situation in which, when a request is made, the user loads data about themselves (if the data is not in the cache). Of course, the example is artificial, but it does show you the difference in how the system behaves.

This is a graph showing the number of hits/misses in a situation with cache lifetimes which are fixed (fixed_cache_misses_count) and varied (random_cache_misses_count), respectively:

You can see that initially in both cases the spikes in demand are very noticeable, but when using pseudo-randomised lifetime these even out much faster.

‘Hot’ keys

Cached data is not homogenous; some of the data may be requested very often. In this case problems may be caused, not by parallel updating but by the very number of times a given key is read. An example of such a key in our project is the counter for the overall number of users:

That’s how many people are already on board — join us!

This counter is one of the most popular keys and the usual approach is for all the requests to this key to go via a single server (since it is only one key, rather than many keys of a single type). The behaviour of the server in question may vary and may slow down work with other keys stored on the same server.

We resolve this problem by not writing data merely to a single caching server, but to several servers at the same time. In doing so we reduce several-fold the number of times this key is read, however we do make it more complicated to update and also make the code for selecting a server more complicated because we will need to use a separate mechanism for this.

At Badoo we have resolved this problem by immediately writing data to all caching servers. As a result, we can use a general server selection mechanism for cached data reading; in the code you can use an ordinary sharding mechanism based on User ID, and, when reading, there is no need to know the specifics of this ‘hot’ key. It works for us because we have a relatively small number of servers (about ten per datacenter).

If there were a lot more caching servers this approach might not be the most convenient; it just doesn’t make sense to duplicate the same data hundreds of times. Instead, the key could be duplicated not for all servers but just for some of them. This option, however, requires a little more work.

If you use server selection based on cache key, you can add a limited number of pseudo-random values (turning total_users_count into something along the lines of total_users_count_1, total_users_count_2 etc.). This approach is used, for example, by Etsy.

If you use explicit sharding parameter specification, then simply send various pseudo-random values.

The main problem with both approaches is ensuring that different values do indeed end up on different caching servers.

Downtime handling

No system is ever going to be 100% reliable, so you need to anticipate how it will behave in the case of downtime. Downtimes can occur when working with the cache itself and also when working with the database.

In previous sections I have already talked about downtimes when working with a cache. The only thing I can add is that it would be good to implement graceful degradation (an ability to switch off some non-critical features in the system’s production environment). This would be useful when the system is unable to cope with a spike in demand.

In the case of database downtime and empty cache, we may find ourselves in a cache stampede situation, which I also mentioned earlier. You can find a way out of this situation using the approaches already described, or you can record a deliberately incorrect value to the cache with a short lifetime. In this case the system is able to determine that the source is inaccessible and will stop attempting to request data for some time.

Conclusion

In this article, although I’ve covered the main problems that can be encountered working with a cache. I’m sure that there are many others and we could continue this conversation for a long time yet. But I do hope that after reading my article, your cache will be more effective.