Redis KEYS vs SCAN?

What’s different between KEYS and SCAN comnand? Issue? Suggestion? 使用 KEYS 跟 SCAN 命令的差異、可能會遇到的問題、及建議事項。

Jerry’s Notes

Published in

What’s next?

11 min readApr 3, 2022

KEYS command 命令:

https://redis.io/commands/KEYS

從 Redsi 官方文檔中有提到，由於 KEYS 是時實複雜度 O(N) 的命令，而該命令會去掃數據庫中所有符合的數值，所以會造成短暫指令操作阻塞的狀況發生。基本這個原因，所以不建議您在您正式環境上，來使用 KEYS 命令。
建議您改用 SCAN 命令，來取代 KEYS 命令。 SCAN 命令的時間複雜度也是 O(n)的命令，但是它是通過游標分步進行的，所以不會阻塞線程，也提供 COUNT 參數，可以控制每次返回結果的最大條數，同時也跟 KEYS 一樣，它也提供模式匹配 MATCH 功能。
Time complexity: O(N) with N being the number of keys in the database, under the assumption that the key names in the database and the given pattern have limited length. Returns all keys matching pattern.
While the time complexity for this operation is O(N), the constant times are fairly low. For example, Redis running on an entry level laptop can scan a 1 million key database in 40 milliseconds.
Warning: consider KEYS as a command that should only be used in production environments with extreme care. It may ruin performance when it is executed against large databases. This command is intended for debugging and special operations, such as changing your keyspace layout. Don't use KEYS in your regular application code. If you're looking for a way to find keys in a subset of your keyspace, consider using SCAN or sets.

SCAN command 命令:

https://redis.io/commands/scan

它是通過游標分步進行的，所以不會阻塞線程，也提供 COUNT 參數，可以控制每次返回結果的最大條數，同時也跟 KEYS 一樣，它也提供模式匹配 MATCH 功能。
SCAN is a cursor-based iterator. This means that at every call of the command, the server returns an updated cursor that the user needs to use as the cursor argument in the next call. An iteration starts when the cursor is set to 0, and terminates when the cursor returned by the server is 0.
SCAN cursor [MATCH pattern] [COUNT count]
■ cursor — 游标。
■ pattern — 匹配的模式。
■ count — 指定从数据集里返回多少元素，默认值为 10 。 !!! 若設很大的結果跟KEYS一樣。

redis 127.0.0.1:6379> sadd myset 1 2 3 foo foobar feelsgood
(integer) 6
redis 127.0.0.1:6379> sscan myset 0 match f*
1) "0"
2) 1) "foo"
 2) "feelsgood"
 3) "foobar"
redis 127.0.0.1:6379>

!!! 所以您可以在代碼中使用 scan 相關命令，來進行key的分析及統計。You also can use SCAN command to query the data.

SCAN 命令， 以及其他增量式迭代命令， 在進行完整遍歷的情況下可以為用戶帶來以下保證： 從完整遍歷開始直到完整遍歷結束期間， 一直存在於數據集內的所有元素都會被完整遍歷返回； 這意味著， 如果有一個元素， 它從遍歷開始直到遍歷結束期間都存在於被遍歷的數據集當中， 那麼 SCAN 命令總會在某次迭代中將這個元素返回給用戶。127.0.0.1:6379> scan 0  # 使用 0 作為游標，開始新的迭代
1) "17"                       # 第一次迭代時返回的游標
2)  1) "key:12"
    2) "key:8"
    3) "key:4"
    4) "key:14"
    5) "key:16"
    6) "key:17"
    7) "key:15"
    8) "key:10"
    9) "key:3"
   10) "key:7"
   11) "key:1"127.0.0.1:6379> scan 17 # 使用的是第一次迭代時返回的游標 17 開始新的迭代
1) "0"
2) 1) "key:5"
   2) "key:18"
   3) "key:0"
   4) "key:2"
   5) "key:19"
   6) "key:13"
   7) "key:6"
   8) "key:9"
   9) "key:11"

若使用scan但 “COUNT 值” 很高的話!

是跟KEYS的結果類似的，就容易造成問題、或效能瓶頸。 If the COUNT value is high, it’s the same result as using KEYS command. It will also stuck the Redis engine to query amount data.

> slowlog
"scan"
        2) "49992503"
        3) "MATCH"
        4) "*"
        5) "COUNT"
        6) "10000"

Redis is single-threaded. One of the reasons SCAN was introduced is to allow going through all the keys without blocking the server for a long time, by going a few steps at a time. However, higher the COUNT, the longer the block. So having COUNT set 10,000 will be definitely slower and block the redis engine. It could cause slowness and timeouts. So if we use high COUNT value then it has similar effects as running keys command.

KEYS 指令有可能造成的問題?

而 KEYS 命令的性能，會隨著數據數量增多而越來越慢，KEYS 會回傳所有符合資料給 client，常見會佔用大量的 “normal client output buffer”，而該COB 的用量會計算在 BytesUsedForCache 使用內存中。此時若大量 client 同時使用 keys 指令時，每個一client 都會使用額外的COB時，自然可能會耗盡可用內存，進而造成 FreeableMemory 開始接近於0，並且開始使用 SwapUsage，此時若是 Redis 因為過於忙碌，所以無法回應健康檢查(health checks)時，就有可能觸發主切從換 failover，進而觸發全同步發生。
要觀察的 ElastiCache 相關指標:
* EngineCPUUtilization
* BytesUsedForCache
* FreeableMemory
* SwapUsage
* SaveInProgress
* ReplicationLag
* ReplicationBytes
* IsMaster

因為KEYS支援＊的特性，所以有很大的機會造成內存耗盡、或是使用大量CPU loading.Due to KEYS command support pattern as *. It may exhaust FreeableMemory and also caused high EngineCPUUtilization.redis 127.0.0.1:6379> KEYS runoob*
1) "runoob3"
2) "runoob1"
3) "runoob2"

使用 scan 來刪除特定的 keys

SCAN

Available since: 2.8.0 Time complexity: O(1) for every call. O(N) for a complete iteration, including enough command…

redis.io

!! Using SCAN command to del specific key/value by pattern.

$ redis-cli -h 127.0.0.1 --scan --pattern *user*
333user333
111user111
222user222
---
$ redis-cli -h 127.0.0.1 monitor
1628235306.665974 [0 127.0.0.1:41710] "SCAN" "0" "MATCH" "*user*"
---### 用 KEYS 命令來查符合的值 ###
$ redis-cli --raw keys "*user*"| xargs redis-cli del### 改用 SCAN 命令來查符合的值(推薦) ###
$ redis-cli --scan --pattern *user* | xargs redis-cli -h 127.0.0.1 del
(integer) 3
---
$ redis-cli -h 127.0.0.1 monitor
1628235306.665974 [0 127.0.0.1:41710] "SCAN" "0" "MATCH" "*user*"
1628235306.671584 [0 127.0.0.1:41712] "del" "333user333" "111user111" "222user222"
---

redis-py 使用 KEYS 跟 SCAN 用法上的改變。


### 使用 KEYS 的方式 ###
import redis
r = redis.StrictRedis(host=YOUR_HOST, port=YOUR_PORT, db=YOUR_DB)

for key in r.keys("user:*""):
    value = r.get(key)
    print(key, value)


### 使用 SCAN 的方式 ###
import redis
r = redis.StrictRedis(host=YOUR_HOST, port=YOUR_PORT, db=YOUR_DB)

for key in r.scan_iter("user:*"):
    value = r.get(key)
    print(key, value)
---


### 使用 SCAN 的方式 ###
# 第一個參數是游標，第二個參數是查詢參數，第三個參數為一次取幾筆(預設10筆)
# 語法: redis_client.scan(cursor=0, match=None, count=None)

---
import redis
r = redis.StrictRedis(host=YOUR_HOST, port=YOUR_PORT, db=YOUR_DB)

cursor = 0
while True:
    cursor, keys = r.scan(cursor=cursor,"user:*")
    for key in keys:
        print(key)
    if cursor == 0:
        break
---

延伸閱讀 (Reference)