ElastiCache Redis 資料分層 SSD 低利用率問題?

Published in

What’s next?

15 min readJul 24, 2023

ElastiCache Redis 資料分層 (data tiering)? 藉由在每個叢集節點中使用成本較低的固態硬碟 (SSD)，為 Redis 工作負載提供全新的價格效能方案。但在使用架構上，也有許多需要注意的地方，這篇介紹什麼是 ElastiCache Redis 資料分層 (data tiering)，並且說明為什麼 SSD 低利用率問題上，可能發生的原因。

什麼是 ElastiCache Redis 資料分層 (Data tiering)?

資料分層還能藉由在每個叢集節點中使用成本較低的固態硬碟 (SSD)，為 Redis 工作負載提供全新的價格效能方案。非常適合定期存取高達 20% 的整體資料集的工作負載，以及在存取 SSD 資料時可容忍額外延遲的應用程式。
在具有資料分層的叢集上，ElastiCache 監控其存放的每個項目的最後存取時間。當可用記憶體 (DRAM) 完全耗盡時，ElastiCache 會使用最近使用的 (LRU) 演算法，自動將不常存取的項目從記憶體移至 SSD。當隨後存取 SSD 上的資料時，ElastiCache 會在處理請求之前自動並以異步的方式將其移回記憶體。如果您的工作負載只會定期存取其資料的子集，則資料分層是以符合成本效益的方式擴展容量的最佳方式。

建議遵循下列最佳實務：

資料分層非常適合定期存取高達 20% 的整體資料集的工作負載，以及在存取 SSD 資料時可容忍額外延遲的應用程式。
使用資料分層節點上可用的 SSD 容量時，建議值的大小大於金鑰大小。項目在 DRAM 和 SSD 之間移動時，金鑰將一律保留在記憶體中，而且只有值會移至 SSD 層。

首先我們先要了解，每一種機型有多少的記憶體(Memory)、及固態硬碟 (SSD)空間，可以用於儲存數據。

Example: cache.r6gd.xlarge
https://aws.amazon.com/elasticache/pricing/?nc1=h_ls
Memory: 26.32 GiB
SSD: 99.33 GiB

遇到的問題如下:

為什麼 SSD 只有使用 11 GiB，就出現 OOM 無法寫入的問題??
cache.r6gd.xlarge 機型不是有 SSD: 99.33 GiB，為什麼我不行寫入??

$ redis-cli -h xx.cache.amazonaws.com set key111 data2222
(error) OOM command not allowed when used memory > 'maxmemory

以下我壓測試方式，也發現一樣的狀況! 為什麼???

### 範例代碼 ###
$ cat redis-feed-data.py 
import redis
import logging
import random
import string

def id_generator(size=6, chars=string.ascii_uppercase + string.digits):
    return ''.join(random.choice(chars) for _ in range(size))

logging.basicConfig(level=logging.INFO)


pool = redis.ConnectionPool(host='xxx.cache.amazonaws.com', port=6379, db=0)
r = redis.Redis(connection_pool=pool)

if r.ping ():
    logging.info ("Connected to Redis")
    while True:
        KeyId=id_generator(20)
        KeyValue=id_generator(40)
        #logging.info (KeyId)
        #logging.info (KeyValue)
        #r.set(KeyId, KeyValue)
        r.set(KeyId, KeyValue, px=8640000)
        #r.get(KeyId)

### key/value ###
INFO:root:30L7UZ5EKOEVZAVNZ9HK
INFO:root:ZOK48SU62N8U7R5BWVTBEPK55NHTIFC8QE0D0LOM

### 多次執行 ###
[ec2-user@ip-10-0-201-63 ~]$ python3 redis-feed-data.py  > /dev/null 2>&1 &
[5] 9713

!!! 
記得要多背景執行多次、並且一定要用 connection pool... 
不然 cache.r6gd.xlarge 要壓滿數據，要很久的....

cache.r6gd.xlarge 
Memory: 26.32 GiB <--- !!!
SSD: 99.33 GiB <--- !!!

探討原因

思路一 : 寫入鍵值時，沒有配置 TTL，所以測試鍵值不會搬到 SSD.

首先請允許我跟您說明，當客戶端存放鍵值進「數據分層」的 Redis 集群時，預設會先存放於內存中，直到可用內存耗盡時，ElastiCache 會使用 Redis 原生 (LRU: 最近最少使用) 的演算法，將較少使用的鍵值搬移至 SSD 中，好讓後面新增的鍵值來寫入，以下文檔提供您參考。

[+] 数据分层 — https://docs.aws.amazon.com/zh_cn/AmazonElastiCache/latest/red-ug/data-tiering.html

對於啓用了數據分層功能的集群，ElastiCache 會監控集群所存儲每個項目的最近訪問時間。當可用內存 (DRAM) 耗盡時，ElastiCache 將使用最近最少使用 (LRU) 算法，自動將不頻繁訪問的項目從內存移動到 SSD 中。隨後訪問 SSD 上的數據時，ElastiCache 會在處理請求之前自動異步將其移回內存中。如果您的工作負載只會經常訪問部分數據，則數據分層將是經濟高效地擴縮容量的極佳方法。
而 redis cluster 参数组中的 maxmemory-policy 参数是预设的 volatile-lru。
maxmemory-policy: volatile-lru ←- 代表当内存使用达 100% 后，Redis 会怎麽淘汰健值。
而从 Redis 官方说明=中，volatile-lru 的淘汰健值条件是，键值必需配置 “expire field”，也就是要配置 TTL 的条件，才会执行。

[+] Key eviction | Redis:
https://redis.io/docs/reference/eviction/

volatile-lru: Removes least recently used keys with the expire field set to true.

解法:

您可以考虑调整从预设的参数 volatile-lru、改使用 allkeys-lru，或是寫入鍵值時，同時配置 TTL。

[+] Key eviction | Redis:
https://redis.io/docs/reference/eviction/

noeviction: New values aren’t saved when memory limit is reached. When a database uses replication, this applies to the primary database
allkeys-lru: Keeps most recently used keys; removes least recently used (LRU) keys ← 建議改用。
allkeys-lfu: Keeps frequently used keys; removes least frequently used (LFU) keys
volatile-lru: Removes least recently used keys with the expire field set to true. ← ElastiCache Redis 預設。
volatile-lfu: Removes least frequently used keys with the expire field set to true.
allkeys-random: Randomly removes keys to make space for the new data added.
volatile-random: Randomly removes keys with expire field set to true.
volatile-ttl: Removes keys with expire field set to true and the shortest remaining time-to-live (TTL) value.

!!! 数据分层仅支持 volatile-lru、allkeys-lru 和 noeviction，這3個maxmemory 策略。

思路二: 寫入值時，鍵的值本身大小大於 128MiB 也不會搬到 SSD。

若壓測時或是您實例業務行為，所存入的鍵值大於 128MiB，也不會移動到 SSD，這也有可能造成您觀察到，壓測試後出現內存不足的可能原因，因為您壓測的數據都沒有搬到 SSD。

[+] 数据分层 — 限制:
https://docs.aws.amazon.com/zh_cn/AmazonElastiCache/latest/red-ug/data-tiering.html#data-tiering-prerequisites

* 大於 128MiB 的項目 “不會” 移動到 SSD。

解法:

從業務下手，直接打散數據的大小、拆分成多筆小數據。 !!!! 同時也可以避免遇到 Bigkey 的問題。

思路三: 寫入值時，鍵比較大，而值是小的。

由於 ElastiCache Redis Data tiering 数据分层，在文档中有明确的说明到，当内存不足时会自动使用(LRU) 算法，自动将数据搬移到 SSD 中，但文档中也明确说明，”使用数据分层时，键本身始终保留在内存中 (keys themselves always remain in memory)”，这代表他不是整笔 KEY+Value 都搬到 SSD 中，只会搬 “值 Value” 到 SSD 中，所以即使 SSD 还有空间，仍然会因为全部的 “键(key)+值(value)” 已经佔满了内存空间，而造成无法继续写入，这也是造成您观察到，压测试后出现内存不足的可能原因。

Key = AAA
Value = BBBBBBBB

[+] 数据分层:https://docs.aws.amazon.com/zh_cn/AmazonElastiCache/latest/red-ug/data-tiering.html

ElastiCache 会监控集群所存储每个项目的最近访问时间。当可用内存 (DRAM) 耗尽时，ElastiCache 将使用最近最少使用 (LRU) 算法，自动将不频繁访问的项目从内存移动到 SSD 中。
请注意，使用数据分层时，键本身始终保留在内存中，而 LRU 将控制值在内存和磁盘上的位置。通常，在使用数据分层时，我们建议您的键大小小于值。 ←-!!!
Note that when using data tiering, keys themselves always remain in memory, while the LRU governs the placement of values on memory vs. disk. In general, we recommend that your key sizes are smaller than your value sizes when using data tiering.

這也就是以下問題的主要可能原因了。

為什麼 SSD 只有使用 11 GiB，就出現 OOM 無法寫入的問題?????
cache.r6gd.xlarge 機型不是有 SSD: 99.33 GiB，為什麼我不行寫入??

Q: 為什麼 Key String (鍵本身) 只能存在記憶體(Memory)上?

試想，Redis 本是就是要達到 “高速存取”、”並且是低延遲(low lantcy)” 的問題，如果當客戶存取特定鍵值時，需要去 “Memoery+SSD” 整個去找鍵的值在那裡時???? 此時快得了嗎? 所以 “Key String (鍵本身)” 做為 Index 的目的就只能在放在記憶體(Memory)上。

Q: 那單一鍵值(Key & Value)、最大可以多大(Data Size)??

[+] Redis Strings | Redis:
https://redis.io/docs/data-types/strings/

Limits By default, a single Redis string can be a maximum of 512 MB.
Since Redis keys are strings, when we use the string type as a value too, we are mapping a string to another string.
Values can be strings (including binary data) of every kind, for instance you can store a jpeg image inside a value. A value can’t be bigger than 512 MB.

!!! 所以Key string 最大是 512 MB，而 Value 本身最大也是 512 MB。簡單來說 Data set 需要由 Key+Value 組成，但是分開來看的。

探討原因 — 難道 SSD: 99.33 GiB 看得到吃/用不到???

結論是可以的，要你存入的鍵key name (小) + 值value(大) 且值小於 128MB 時，可以存入最大量的數據。

壓測方式:

先產生一個大檔案，並確保檔案大小小於 128 MiB。
寫入 Redis 並且配置 TTL。

$ du -sh 111M
111M 111M

 cat redis-feed-large-file.sh

#!/bin/bash
host="r13306446021.5t9cps.ng.0001.usw2.cache.amazonaws.com"
port="6379"

if [ -f 111M ]; then
    key=`cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 10 | head -n 1`
    redis-cli -h $host -p $port -x HSET $key image_binary < 111M 
    redis-cli -h $host -p $port expire $key 86400000 
    redis-cli -h $host -p $port info memory | grep used_memory_human
fi

$ sh redis-feed-large-file.sh

ElastiCache Redis Data tiering 數據分層，有許多的硬限制，例如: 能搬進來、但不能匯出到 S3，也不能轉換回「非」 Data tiering 機型，以下挑選幾個比較重要的限制，提供參考。

[+] 資料分層 — 限制 — https://docs.aws.amazon.com/zh_tw/AmazonElastiCache/latest/red-ug/data-tiering.html#data-tiering-prerequisites

資料分層具有下列限制：

只能在屬於複寫群組一部分的叢集上使用資料分層。
使用的節點類型必須來自 r6gd 系列，該系列在下列區域可用：us-east-2、us-east-1、us-west-2、us-west-1、eu-west-1、eu-central-1、eu-north-1、eu-west-3、ap-northeast-1、ap-southeast-1、ap-southeast-2、ap-south-1、ca-central-1 和 sa-east-1。
必須使用 Redis 6.2 或更新版本的引擎。
無法將 r6gd 叢集的備份還原到另一個叢集，除非該叢集也使用 r6gd。
無法將備份匯出到 Amazon S3 以用於資料分層叢集。
在 r6gd 節點類型上執行的叢集不支援線上遷移。
不支援從資料分層叢集 (例如，使用 r6gd 節點類型的叢集) 擴展到未使用資料分層的叢集 (例如，使用 r6g 節點類型的叢集)。如需更多詳細資訊，請參閱擴展 ElastiCache for Redis 叢集。
使用資料分層執行的叢集不支援自動擴展。如需詳細資訊，請參閱為 ElastiCache for Redis 叢集自動調整規模
資料分層僅支援volatile-lru、allkeys-lru 和 noeviction 最大記憶體政策。
不支援無叉 (forkless) 儲存。如需更多詳細資訊，請參閱同步與備份的實作方式。
大於 128 MiB 的項目不會移至固態硬碟。

延伸閱讀 (Reference)