3 major problems and solutions in the cache world

The current IO devices are far from satisfying the massive read and write requests of Internet applications. Then there is a cache, using the high-speed read and write performance of memory to cope with a large number of query requests. However, memory resources are invaluable, and it is obviously impractical to store the full amount of data in memory. Therefore, the current combination of memory and IO, memory only stores hotspot data, while IO devices store full amount of data. The design of the cache contains a lot of tricks, and improper design can lead to serious consequences. This article will introduce the three major problems commonly encountered in the use of caching, and give the corresponding solution.

1. Cache penetration

In most Internet applications:

  1. When the business system initiates a certain query request, it first determines whether the data exists in the cache;
  2. If there is a cache, return the data directly;
  3. If the cache does not exist, query the database again and return the data.

After understanding the above process, let’s talk about cache penetration.

1.1 What is cache penetration?

The data to be queried by the business system simply exists! When the business system initiates a query, according to the above process, the query will first go to the cache, because the cache does not exist, and then go to the database for query. Since the data does not exist at all, the database also returns null. This is the cache penetration.

To sum up: the data that the business system access does not exist at all is called cache penetration.

1.2 The hazard of cache penetration

If there are massive data that does not exist in the query request, then these massive requests will fall into the database, and the database pressure will increase dramatically, which may lead to system crash. (You have to know that the most vulnerable in the current business system is IO, a little bit It will collapse under pressure, so we have to think of ways to protect it).

1.3 Why does cache penetration occur?

There are many reasons for cache penetration, which are generally as follows:

  1. Malicious attacks deliberately create a large amount of non-existent data to request our services. Since these data do not exist in the cache, massive requests fall into the database, which may cause the database to crash.
  2. Code logic error. This is the programmer’s pot, nothing to say, must be avoided in development!

1.4 Cache penetration solution

Here are two ways to prevent cache penetration.

1.4.1 Cache empty data

The reason for cache penetration is that there is no key in the cache to store these empty data, causing all of these requests to hit the database.

Then, we can slightly modify the code of the business system, and store the key with the empty database query result in the cache. When the query request for the key occurs again, the cache directly returns null without querying the database.

1.4.2 BloomFilter

The second way to avoid cache penetration is to use BloomFilter.

It needs to add a barrier before the cache, which stores all the keys that exist in the current database.

When the business system has a query request, first go to the BloomFilter to check whether the key exists. If it does not exist, it means that the data does not exist in the database, so the cache should not be checked, and it returns null directly. If it exists, continue to perform the subsequent process, first go to the cache to query, if there is no cache, then go to the query in the database.

1.4.3 Comparison of the two schemes

Both of these solutions can solve the problem of cache penetration, but the usage scenarios are different.

For some malicious attacks, the keys of the query are often different, and the data thief is more. At this point, the first option is too much. Because it needs to store the keys of all the empty data, and the keys of these malicious attacks are often different, and the same key is often only requested once. Therefore, even if the key of these empty data is cached, since the second time is no longer used, the role of protecting the database cannot be achieved. Therefore, for a scenario where the keys of the null data are different and the probability of the key repeat request is low , the second scheme should be selected. For the scenario where the number of keys of the null data is limited and the probability of the key repeat request is high , the first scheme should be selected.

2. Cache avalanche

2.1 What is a cache avalanche?

As you can see from the above, the cache actually plays a role in protecting the database. It helps the database to withstand a large number of query requests, thus avoiding vulnerable databases.

If the cache goes down for some reason, the massive query request that was originally blocked by the cache will flock to the database like a mad dog. At this point, if the database can’t withstand this huge pressure, it will collapse.

This is the cache avalanche.

2.2 How to avoid caching avalanches?

2.2.1 Using a Cache Cluster to Ensure High Availability of Caches

That is, before the avalanche occurs, preventive measures are taken to prevent the occurrence of avalanches. PS: The issue of distributed high availability is not the focus of today’s discussion. Routines will be followed by high-availability related articles. Please pay attention.

2.2.2 Using Hystrix

Hystrix is ​​an open source “anti-avalanche tool” that reduces losses after avalanches by blowing, degrading, and limiting currents.

Hystrix is ​​a Java class library that uses a command pattern, and each service processing request has its own processor. All requests go through their respective processors. The processor records the request failure rate of the current service. Once the failure rate of the current service is found to reach the preset value, Hystrix will reject all subsequent requests for the service and return a default result. This is the so-called “fuse” . After a period of time, Hystrix will release a portion of the request for the service and again count its request failure rate. If the request failure rate meets the preset value at this time, the current limit switch is fully turned on; if the request failure rate is still high, then all requests for the service are refused. This is the so-called “current limit” . Hystrix returns a default result directly to those rejected requests, known as”downgrade”**.

3. Hotspot data set is invalid

3.1 What is the hotspot data set failure?

We usually set an expiration time for the cache. After the expiration time, the database will be deleted directly by the cache, thus ensuring the real-time performance of the data to a certain extent.

However, for some hot data with very high requests, once the valid time has passed, there will be a large number of requests falling on the database at this moment, which may cause the database to crash. The process is as follows:

If a hotspot data fails, then when there is a query request [req-1] for the data again, it will go to the database query. However, from the time the request is sent to the database to the time the data is updated into the cache, since the data is still not in the cache, the query request arriving during this time will fall on the database, which will cause the database Enormous pressure. In addition, when these request queries are completed, the cache is updated repeatedly.

3.2 Solution

3.2.1 Mutex

We can use the lock mechanism that comes with the cache. When the first database query request is initiated, the data in the cache will be locked; at this time, other query requests that arrive at the cache will not be able to query the field, and thus will be blocked waiting; After a request completes the database query and caches the data update value, the lock is released; at this time, other blocked query requests can be directly retrieved from the cache.

When a hotspot data fails, only the first database query request is sent to the database, and all other query requests are blocked, thus protecting the database. However, due to the use of a mutex, other requests will block waiting and the throughput of the system will drop. This needs to be combined with actual business considerations to allow this.

Mutex locks can avoid the problem of database corruption caused by the failure of ahotspot data. In actual business, there are often scenes where a batch of hotspot data fails at the same time. So how do you prevent database overload for this scenario?

3.3.2 Setting different expiration times

When we store this data in the cache, we can stagger their cache expiration time. This can avoid simultaneous failures. For example, add/subtract a random number at a base time to stagger the expiration time of these caches.