Redis maximum memory Eviction

當 Redis 內存使用量 (used_memory),大於最大內存值(maxmemory)時,Redis 會執行 Eviction 來淘汰鍵值,來讓新的鍵值寫入。首先Redis 會每次選一批數據 maxmemory-sample (default 3),然後從這批數據執行淘汰策略 maxmemory-policy,來淘汰鍵值,讓出內存空間,來讓新的鍵值來寫入。這樣的好處就是性能高,壞處就是不一定是全局最優,只是達到局部最優。

Jerry’s Notes
What’s next?
4 min readMar 20, 2022

--

Q: When eviction is triggered?

Evictions occur when cache memory is overfilled or is greater than the maxmemory setting for the cache, causing the engine — selecting keys to evict in order to manage its memory. The keys that are chosen are based on the eviction policy you select. By default, Amazon ElastiCache for Redis sets the volatile-lru eviction policy to your Redis cluster. When this policy is selected, the least recently used keys that have an expiration (TTL) value set are evicted. Other eviction policies are available and can be applied in the configurable maxmemory-policy parameter.

Q: 淘汰策如何執行?

1. 當Redis收到一條會新增資料的命令
2. Redis檢查記憶體使用情況,是否超過最大記憶體限制。 (Used_memory > maxmemroy)
3. 超過則執行
記憶體驅逐策略(maxmemory-policy),然後執行命令(eviction)

maxmemory-policy — 內存淘汰策略

noeviction:當內存達到設置的最大值時,所有申請內存的操作都會報錯(如set,lpush等),只讀操作如get命令可以正常執行;

■ allkeys-lru:所有key使用LRU算法淘汰;

■ volatile-lru:設置了過期時間的key使用LRU算法淘汰;(ElastiCache 預設策略!!)

■ allkeys-random:所有key使用隨機淘汰;

■ volatile-random:設置了過期時間的key使用隨機淘汰;

■ volatile-ttl:設置了過期時間的key根據過期時間淘汰,越早過期越早淘汰;

■ volatile-lfu:設置了過期時間的key使用LFU算法淘汰;

■ allkeys-lfu:所有key使用LFU算法淘汰;

!!! 當使用volatile-lru、volatile-random、volatile-ttl這三種內存淘汰策略時,如果沒有鍵值key可以被淘汰時,則和noeviction一樣返回錯誤。

ElastiCache Redis 的指標

https://docs.aws.amazon.com/zh_cn/AmazonElastiCache/latest/red-ug/CacheMetrics.Redis.html

Evictions : 由於 maxmemory 限制而被驅逐的密鑰數。這是從evicted_keys統計數據Redis 信息。

LRU — Least Recently Used 最近最少使用<- 用存取時間點來淘汰存取時間最久的鍵值。

LRU(Least Recently Used),即最近最少使用,簡單來說,如果一個數據在最近一段時間沒有被用到,那麼將來被使用到的可能性也很小,所以就可以被淘汰掉。可以記錄每個快取記錄的最近訪問時間,最近未被訪問時間最長的資料會被首先淘汰

Redis LRU algorithm is not an exact implementation. This means that Redis is not able to pick the best candidate for eviction, that is, the access that was accessed the most in the past. Instead, it will try to run an approximation of the LRU algorithm, by sampling a small number of keys, and evicting the one that is the best (with the oldest access time) among the sampled keys. What is important about the Redis LRU algorithm is that you are able to tune the precision of the algorithm by changing the number of samples to check for every eviction. This parameter is controlled by the following configuration directive:

LFU — Least Frequently Used 最不常用 <- 用存取頻率來淘汰最少用的鍵值。

LFU演算法是Redis4.0裡面新加的一種淘汰策略。它的全稱是Least Frequently Used,它的核心思想是根據key的最近被訪問的頻率進行淘汰,很少被訪問的優先被淘汰,被訪問的多的則被留下來。可以記錄每個快取記錄的最近一段時間的訪問頻率,訪問頻率低的資料會被優先淘汰

Starting with Redis 4.0, a new Least Frequently Used eviction mode is available. This mode may work better (provide a better hits/misses ratio) in certain cases, since using LFU Redis will try to track the frequency of access of items, so that the ones used rarely are evicted while the one used often have a higher chance of remaining in memory.

If you think at LRU, an item that was recently accessed but is actually almost never requested, will not get expired, so the risk is to evict a key that has a higher chance to be requested in the future. LFU does not have this problem, and in general should adapt better to different access patterns.

Q: what’s different allkeys-xx vs volatile-xxx?

Volatile-xxx will only expire keys have set TTL.

Q: 如何查詢已經到期的 key ,留存在內存中數量及比例?

在 redis 4.0 以上版本, redis info 命令的輸出中有一個指標 “expired_stale_perc”, 這可以協助您判斷有多少的已經到期的 key 仍留存在內存中.

# redis-cli -c -h expire-test.xxxxx.ng.0001.apne1.cache.amazonaws.com✠ info|grep expire
expired_keys:8963571
expired_stale_perc:17.91
expired_time_cap_reached_count:1289

Q: Can eviction cause performance degradation?

Yes. 因為同一時間redis要先刪除舊的鍵值,讓新的鍵值來寫入,若是操作大鍵值造成高 COB 時,更為明顯。

If maxmemory is reached and there are massive write requests, serving the requests can be delayed, since eviction also consumes cpu times and Redis is single-threaded. At that time, more write request can make more eviction operations.

If maxmemory is reached with large size of normal COB due to high number of read requests or request for large data, eviction also can happen significantly with consuming long cpu time and then performance can be degraded.

Q: How do I resolve the error message “OOM command not allowed when used memory > ‘maxmemory’” for an Amazon ElastiCache Redis cluster node?

https://aws.amazon.com/premiumsupport/knowledge-center/oom-command-not-allowed-redis/

• Set a TTL value for keys on your node.

•Update the parameter group to use a different maxmemory-policy parameter.

• Delete some existing keys manually to free up memory.

•Choose a larger node type.

Q: 調高 maxmemory-samples 3 至10 的好處、壞處?

maxmemory-samples 3: 代表當 Redis 執行 Eviction 時,會依maxmemory-samples 的設定值,來”隨機”挑選 N 筆鍵值,來依據maxmemory-policy 來執行淘汰策略 。

!!! 調高 maxmemory-sample 的好處是,Eviction 的過程,會接近maxmemory-policy 的要求出來的結果(比較準確的策略結果!),但因為需要挑選比較多的鍵值來比較,所以一定會增加 CPU 負載,自然會增加操作的延遲增加的機率。

!!! 請注意,Redis 是單執行序服務,調整此參數,請考量這一點。

However, you can raise the sample size to 10 at the cost of some additional CPU usage in order to closely approximate true LRU, and check if this makes a difference in your cache miss’s rate. To experiment in production with different values for the sample size by using the CONFIG SET maxmemory-samples <count> command, is very simple.

Q: 若是出現高 SWAP Usage, 但沒有出現 Eviction 的原因???

因為 Eviction 出現時,是因為內存使用量大於 maxmemory,但這個數值並不包含 replication output buffer 及 AOF buffer。所以當內存不足時,有可能導致 SWAP 使用量增加。

Q: 內存不足的錯誤 — (error) OOM?

[+] How do I resolve the error message “OOM command not allowed when used memory > ‘maxmemory’” for an Amazon ElastiCache Redis cluster node?

https://aws.amazon.com/premiumsupport/knowledge-center/oom-command-not-allowed-redis/

> set key1 value1
(error) OOM command not allowed when used memory > 'maxmemory'.
> info memory
# Memory
used_memory:436490824
used_memory_human:416.27M <----
maxmemory:436469760
maxmemory_human:416.25M <----
maxmemory_policy:volatile-lru
### Recommendation 建議處理方式 ###
1. Please upgrade the cache node type to get more memory for your workload.
2. Choose a different maxmemory-policy setting.
3. Delete existing keys to free up memory.
4. Set a TTL value for your keys on the Redis node.

Q: 如何優化過期key的處理?

!!! 主動去降低過期的鍵值,讓出現 Eviction 的機率降低,這樣的好處可以在非業務忙碌的時間去降低過期鍵值的數量,進而讓可用內存降低。

a. 您可以透過定期的執行 redis scan 命令,來加速過期key的回收(如 cron job)。 您可以使用scan來掃描整做個redis上的key,讓過期的 key 被刪除,在執行 scan 命令時會造成系統負載的上升,因此建議您在負戴較低的時段進行,或是減小scan每次執行的週期。

python 的範例

import redis
from itertools import izip_longest
r = redis.StrictRedis(host='xxx1.apne1.cache.amazonaws.com', port=6379, db=0)# iterate a list in batches of size n
def batcher(iterable, n):
args = [iter(iterable)] * n
return izip_longest(*args)
for keybatch in batcher(r.scan_iter('*'),10):
print(keybatch)

b. 使用集群模式來將 key 分散至不同的分片(shard)上,不同的 shard 可以同時處理過期 key 的回收,會比單一個 shard 來得更有效率。
c. 延長 key 的過期時間,這可以增加一定時間內 key 的回收數量。
d. 升級您的 redis 至 6.0 版本,6.0 版本針對 key 的過期回收機制進行了改良,可以讓過期的 key 回收更有效率。在 Elasticache 上您可以使用參數”active-expire-effort”增進回收的積極度。

--

--

Jerry’s Notes
What’s next?

An cloud support engineer focus on troubleshooting with customer reported issue ,and cloud solution architecture.