如何搬遷或同步,外部的 Redis 到 AWS ElastiCache Redis?

搬遷 Redis 的方式,主要分為兩類,一個是離線搬移數據,另一個是在線上同步數據,而主要的考慮點是在於,您的業務中斷時間可以允許多長,另外來源端(source redis)上的數據量多寡、以及時時異動數據量的高低,也是主要需要考慮的地方。 此篇有詳細使用第三方工具 RedisShake,來同步數據的記錄,也建議您參考。

Jerry’s Notes
What’s next?
54 min readJan 2, 2023

--

Q: 為什麼 ElastiCahce Redis 上 psync/sync 是受限命令(Restricted Commands)?

psync 這個命令被限制的主要原因,是因為當在 ElastiCahe Redis (來源端 source)上使用該命令時,從節點(第三方軟體RedisShake),會使用 psync 命令對該 Redis 集群的主節點,發起 “同步數據(full sync)” 的動作,此時主節點上的 Redis 服務,需要將所有內存內所有的數據,同步至從節點(第三方軟體RedisShake)上,而這個動作會造成許多影響,例如 EngineCPU 增加、內存使用量增加(Slave output buffer/SWAPUsage)、網路進出流量(Network traffic)增加等等。 另外,這個動作容易造成主節點忙碌無法回應,而導致節點發生故障、客戶端(Redis client)出現操作延遲、連線失敗(Conntion Timeout)等等問題,最差的情況會導致數據丟失。

!!! 所以不建議使用 “線上搬遷(Online migration)”,來搬移 Redis 上的數據,除非您對 Redis 非常熟悉、也很了解您的 Redis 集群。

[+] 受限 Redis 命令 :
https://docs.aws.amazon.com/zh_cn/AmazonElastiCache/latest/red-ug/RestrictedCommands.html

離線搬遷(Offline migration)

這個方式,先將來源端(source redis)上的數據,備份導出 RDB 檔,然後上傳至 AWS s3 中,最終用這一個 RDB 備份檔,來創建一個、新的 ElastiCache Redis。 !!! 這個方式是 ElastiCache Redis 數據搬移的最佳建議。

優點:
1) 可以確保資料一致性。
2) 對來源端(source redis)效能影響最低。

離點:
1) 對於客戶端(Redis Client)感受的 “離線時間(Down Time)” 最長。
2) 客戶端(Redis Client)可能需要補 “差異的數據”。

[+] Seeding a new cluster with an externally created backup

[+] How can I optimize performance when I upload large files to Amazon S3?

ElastiCache 支援的線上搬遷(Online migration)

使用線上遷移,您就可以從 Amazon EC2 的自助託管 Redis 遷移您的資料至 Amazon ElastiCache。因為限制比較多,所以使用情境有所限縮。

限制:
1) 只支援在 EC2 上的 Redis。
2) 只支援非叢集模式(cluster disable)。
3) Engine version 2.8.21+。
4) AUTH is disable。
5) logical databases 的數量要一致。
6) 來源端(source redis)上的 CPU 負載,不能太高,因為他使用 psync/sync 來做同步。
7) 來源端(source redis)、與目的地端(ElastiCache Redis),必需在同一個 VPC,且網路連通性必需是通的。

[+] Online migration to ElastiCache

Source EC2 Redis server (Amazon EC2 的自助託管 Redis)

$ redis-cli -h 127.0.0.1 monitor
1628064252.732588 [0 10.0.102.206:24799] "PING"
>> 這邊沒有特別的變動。

$ redis-cli -h 127.0.01
127.0.0.1:6379> config set protected-mode "no"
OK
127.0.0.1:6379> config get protected-mode
1) "protected-mode"
2) "no"

127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=10.0.102.206,port=6379,state=online,offset=423,lag=0

Target ElastiCache Redis Cluster

$ redis-cli -h xxxx.amazonaws.com monitor
1628063833.123818 [0 10.37.1.179:41794] "ping"
1628063845.183302 [0 10.37.1.179:41800] "ping"
1628064523.156131 [0 10.0.200.59:6379] "ping"
1628064533.175782 [0 10.0.200.59:6379] "ping"
>> 持續的在 ping,是從 Master 來的詢問。

$ redis-cli -h xxxx.amazonaws.com
xxxx.amazonaws.com:6379> get ddd
"2222"
xxx.amazonaws.com:6379> info replication
# Replication
role:slave
master_host:10.0.200.59
master_port:6379
master_link_status:up
...
slave_read_only:1
connected_slaves:1
slave0:ip=10.37.1.179,port=6379,state=online,offset=437,lag=1
master_replid:e9f6cb80c2835886836e26743abcc69c57f7858c

線上搬遷(Online migration) by 第三方工具

以下介紹三個第三方工具,可以用來做 Redis 的數據遷移的動作,除了可以使用 RDB 備份檔來匯入數據外,也可以使用 “時時同步 (psync/sycn) “ 的方式,來同步數據。

1) GitHub — alibaba/RedisShake:
redis-shake is a tool for Redis data migration and data filtering. redis-shake 是一个用于 Redis 数据迁移与过滤的工具。

[+] https://github.com/alibaba/RedisShake
[+] https://github.com/alibaba/RedisShake/wiki

2) Redis Input/Output Tools (RIOT)
Redis Input/Output Tools (RIOT) is a set of import/export command line utilities for Redis:
• RIOT Redis: live replication from any Redis database (including AWS Elasticache) to another Redis database.
• RIOT DB: migrate from an RDBMS to Redis, RediSearch, RedisJSON, …

Why RIOT-Redis ?
Migrating data in and out of AWS ElastiCache requires backing up your Elasticache data to an AWS S3 bucket and then downloading the RDB backup file from it. RIOT-Redis allows for live data migration between any Redis databases. IOT-Redis does not make use of the REPLICAOF command which is not always available (see ElastiCache restrictions). Instead it implements client-side replication using DUMP & RESTORE.

* support offline by snapshot ( — mode snapshot)
Initial replication using key scan

* support online ( — mode live)
Initial + continuous replication using key scan and keyspace notifications in parallel

[+] RIOT | The Home of Redis Developers

3) redis-migrate-tool
• Multi-Threads.
• Based on redis replication.
• Live migration.
• Twemproxy and redis cluster support.

[+] redis-migrate-tool: Redis 集群迁移工具,基于redis复制,快速,稳定

以下使用第三方工具 Redis-Shake 來搬移數據。

redis-shake是阿里雲Redis&MongoDB團隊開源的用於redis數據同步的工具。
redis-shake的基本原理就是模擬一個從節點加入源redis集群,首先進行全量拉取並回放,然後進行增量的拉取(通過psync命令),並支持解析、恢復、備份、同步四個功能。

* 恢復restore:將RDB文件恢復到目的redis數據庫。
* 備份dump:將源redis的全量數據通過RDB文件備份起來。
* 解析decode:對RDB文件進行讀取,並以json格式解析存儲。
* 同步sync:支持源redis和目的redis的數據同步,支持全量和增量數據的遷移,支持從雲下到阿里雲雲上的同步,也支持雲下到雲下不同環境的同步,支持單節點、主從版、集群版之間的互相同步。需要注意的是,如果源端是集群版,可以啟動一個RedisShake,從不同的db結點進行拉取,同時源端不能開啟move slot功能;對於目的端,如果是集群版,寫入可以是1個或者多個db結點。
* 同步rump:支持源redis和目的redis的數據同步,僅支持全量的遷移。採用scan和restore命令進行遷移,支持不同雲廠商不同redis版本的遷移。
* 指定Key恢復:filter.key.whitelist可以過濾指定前輟的key,但是只適用於restore, sync and rump三種模式。

!!! 以上文字內容,出處參考 [+] redis-shake数据同步&迁移工具-阿里云开发者社区 及 [+] 利用Redis-Shake将Redis数据迁移到亚马逊 ElastiCache Redis

首先在 EC2 上來安裝及使用 Redis-Shake ,來搬移 Redis 數據

這個部份很簡單,請在與 ElastiCache Redis 所在相同的 AWS VPC 內,去創建一台新的 EC2 實例,再去安裝 RedisShake 這個工具。 !

!!! 請確保這一台實例 “必需” 能夠連接到 “來源端(source)” 及 “目的地端(target) 的 Redis cluster。

!!! 另外,這一台實例的效能也 “必需” 考量到,因為這一台實例的網路帶寬用量,會受到 “整體數據量” 及客戶端 “時時異動的數據量” 有很大的影響。

1) setup RedisShake
$ sudo yum install go git
$ git clone https://github.com/alibaba/RedisShake.git
$ cd RedisShake/
$ sh build.sh

2) Edit sync.toml or restore.toml.

使用 Redis-Shake 來離線搬遷(Offline migration)

用 RDB 檔匯入 ElastiCache Redis Cluster,這個方式跟將 RDB 上傳至 AWS s3 後,再匯入 ElastiCache Redis 的做法是類似的,但這個做法可以匯入目前 “已存在” 的 ElastiCache Redis Cluster,而不需要重建。

!!! 但這也是問題,因為已存在於 “目標端 Redis” 上的數據不會刪除,所以會有來源端及目標端數據量不一致的問題。

故請執行 Redis-Shake 前,請先用 flushdb 命令,在 “目標端 Redis” 去刪除上面的數據。

單一個 RDB 檔 ==> 到目的地標 ElastiCache 非集群模式 (cluster mode disable)。

Target: Redis cluster with cluster mode disable (一個主節點的 “主從模式” replication),使用 Redis-Shake 先載入 RDB 檔,再使用 restore 命令來匯入到 ElastiCache Redis 中。

結論:
1) 使用 restore 命令來匯入資料。
2) 同名稱的舊鍵,會被覆蓋。
3) 不存在的鍵,會新增。

$ vi restore.toml 
---
type = "restore"

[source]
version = 6.2 # redis version, such as 2.8, 4.0, 5.0, 6.0, 6.2, 7.0, ...
# Path to the dump.rdb file. Absolute path or relative path. Note
# that relative paths are relative to the dir directory.
rdb_file_path = "/home/ec2-user/efs/redis-6.2.6.rdb" # <----!!!!

[target]
type = "standalone" # standalone or cluster # <----!!!!
# When the target is a cluster, write the address of one of the nodes.
# redis-shake will obtain other nodes through the `cluster nodes` command.
version = 6.2 # redis version, such as 2.8, 4.0, 5.0, 6.0, 6.2, 7.0, ... <----!!!!
address = "single-001.5t9cps.0001.usw2.cache.amazonaws.com:6379" # <----!!!!
username = "" # keep empty if not using ACL
password = "" # keep empty if no authentication is required
tls = false
---

$ pwd
/home/ec2-user/RedisShake
$ ./bin/redis-shake restore.toml
---
2023-01-02 02:48:42 INF GOOS: linux, GOARCH: arm64
2023-01-02 02:48:42 INF Ncpu: 3, GOMAXPROCS: 3
2023-01-02 02:48:42 INF pid: 8484
2023-01-02 02:48:42 INF pprof_port: 0
2023-01-02 02:48:42 INF No lua file specified, will not filter any cmd.
2023-01-02 02:48:42 INF no password. address=[single-001.5t9cps.0001.usw2.cache.amazonaws.com:6379]
2023-01-02 02:48:42 INF redisWriter connected to redis successful. address=[single-001.5t9cps.0001.usw2.cache.amazonaws.com:6379]
2023-01-02 02:48:42 INF NewRDBReader: path=[/home/ec2-user/efs/redis-6.2.6.rdb]
2023-01-02 02:48:42 INF NewRDBReader: absolute path=[/home/ec2-user/efs/redis-6.2.6.rdb]
2023-01-02 02:48:42 INF start send RDB. path=[/home/ec2-user/efs/redis-6.2.6.rdb]
2023-01-02 02:48:42 INF RDB version: 9
2023-01-02 02:48:42 INF RDB AUX fields. key=[redis-ver], value=[6.2.6]
2023-01-02 02:48:42 INF RDB AUX fields. key=[redis-bits], value=[64]
2023-01-02 02:48:42 INF RDB AUX fields. key=[ctime], value=[1672304836]
2023-01-02 02:48:42 INF RDB AUX fields. key=[used-mem], value=[907008]
2023-01-02 02:48:42 INF RDB AUX fields. key=[aof-preamble], value=[0]
2023-01-02 02:48:42 INF RDB resize db. db_size=[3], expire_size=[0]
2023-01-02 02:48:42 INF send RDB finished. path=[/home/ec2-user/efs/redis-6.2.6.rdb]
2023-01-02 02:48:42 INF finished.
---

$ redis-cli -h single-001.5t9cps.0001.usw2.cache.amazonaws.com monitor
OK
1672627722.294308 [0 10.0.200.237:37198] "ping"
1672627722.299435 [0 10.0.200.237:37198] "restore" "data" "0" "\x00\n2022-12-29\x06\x00\xbcy\x10\xe2\x9e\xa7n6" "replace"
1672627722.299451 [0 10.0.200.237:37198] "restore" "key" "0" "\x00\xc0o\x06\x00Y#MBJ\xf4\xc3g" "replace"
1672627722.299473 [0 10.0.200.237:37198] "restore" "key2" "0" "\x00\xc1\xde\x00\x06\x00\x81\xc2\x87p\x9a\xce\x9a|" "replace"

>> 建議先清除,目標端主節點上,所有的數據。
$ redis-cli -h xxx flushdb

[+] RESTORE | Redis:
https://redis.io/commands/restore/
Create a key associated with a value that is obtained by deserializing the provided serialized value (obtained via DUMP).

[+] DUMP | Redis:
https://redis.io/commands/dump/
Serialize the value stored at key in a Redis-specific format and return it to the user. The returned value can be synthesized back into a Redis key using the RESTORE command.

單一個 RDB 檔 ==> 到目的地標 ElastiCache 集群模式 (cluster mode enable)

Target: Redis cluster with cluster mode enable (多個主節點 — multiple primary nodes),使用 Redis-Shake 先載入 RDB 檔,然後連接到目標端,多個主節點上,再使用 restore 命令來匯入到 ElastiCache Redis 中。

結論:
1) 他是分開對每一個節點,用 restore 命令,來匯入數據的。
2) 同名稱的舊鍵,會被覆蓋。
3) 不存在的鍵,會新增。

$ vi restore.toml 
---
type = "restore"

[source]
rdb_file_path = "/root/dump.rdb"

[target]
type = "cluster"
address = "cluster.5t9cps.clustercfg.usw2.cache.amazonaws.com:6379" # 这里写集群中的任意一个节点的地址即可
username = "" # keep empty if not using ACL
password = "" # keep empty if no authentication is required
tls = false


type = "restore"

[source]
version = 6.2 # redis version, such as 2.8, 4.0, 5.0, 6.0, 6.2, 7.0, ...
# Path to the dump.rdb file. Absolute path or relative path. Note
# that relative paths are relative to the dir directory.
rdb_file_path = "/home/ec2-user/efs/redis-6.2.6.rdb" # <----!!!!

[target]
type = "cluster" # standalone or cluster # <----!!!!
# When the target is a cluster, write the address of one of the nodes.
# redis-shake will obtain other nodes through the `cluster nodes` command.
version = 6.2 # redis version, such as 2.8, 4.0, 5.0, 6.0, 6.2, 7.0, ... # <----!!!!
address = "cluster.5t9cps.clustercfg.usw2.cache.amazonaws.com:6379" # 这里写集群中的任意一个节点的地址即可 <----!!!!
username = "" # keep empty if not using ACL
password = "" # keep empty if no authentication is required
tls = false
---

$ pwd
/home/ec2-user/RedisShake
$ ./bin/redis-shake restore.toml
---
2023-01-02 02:59:00 INF GOOS: linux, GOARCH: arm64
2023-01-02 02:59:00 INF Ncpu: 3, GOMAXPROCS: 3
2023-01-02 02:59:00 INF pid: 8519
2023-01-02 02:59:00 INF pprof_port: 0
2023-01-02 02:59:00 INF No lua file specified, will not filter any cmd.
2023-01-02 02:59:00 INF no password. address=[cluster.5t9cps.clustercfg.usw2.cache.amazonaws.com:6379]
2023-01-02 02:59:00 INF redisClusterWriter load cluster nodes. line=321b088d163cd069de1c55c5f72ba40c6052b81f 10.0.102.115:6379@1122 myself,master - 0 0 1 connected 0-8191
2023-01-02 02:59:00 INF no password. address=[10.0.102.115:6379]
2023-01-02 02:59:00 INF redisWriter connected to redis successful. address=[10.0.102.115:6379]
2023-01-02 02:59:00 INF redisClusterWriter load cluster nodes. line=bf871472a0be4f1065e26c7bfa5a6db8af7c2cce 10.0.101.133:6379@1122 master - 0 1672628339647 0 connected 8192-16383
2023-01-02 02:59:00 INF no password. address=[10.0.101.133:6379]
2023-01-02 02:59:00 INF redisWriter connected to redis successful. address=[10.0.101.133:6379]
2023-01-02 02:59:00 INF redisClusterWriter connected to redis cluster successful. addresses=[10.0.102.115:6379 10.0.101.133:6379]
2023-01-02 02:59:00 INF NewRDBReader: path=[/home/ec2-user/efs/redis-6.2.6.rdb]
2023-01-02 02:59:00 INF NewRDBReader: absolute path=[/home/ec2-user/efs/redis-6.2.6.rdb]
2023-01-02 02:59:00 INF start send RDB. path=[/home/ec2-user/efs/redis-6.2.6.rdb]
2023-01-02 02:59:00 INF RDB version: 9
2023-01-02 02:59:00 INF RDB AUX fields. key=[redis-ver], value=[6.2.6]
2023-01-02 02:59:00 INF RDB AUX fields. key=[redis-bits], value=[64]
2023-01-02 02:59:00 INF RDB AUX fields. key=[ctime], value=[1672304836]
2023-01-02 02:59:00 INF RDB AUX fields. key=[used-mem], value=[907008]
2023-01-02 02:59:00 INF RDB AUX fields. key=[aof-preamble], value=[0]
2023-01-02 02:59:00 INF RDB resize db. db_size=[3], expire_size=[0]
2023-01-02 02:59:00 INF send RDB finished. path=[/home/ec2-user/efs/redis-6.2.6.rdb]
2023-01-02 02:59:00 INF finished.
---


[ec2-user@ip-10-0-200-237 ~]$ redis-cli -h cluster-0001-001.5t9cps.0001.usw2.cache.amazonaws.com monitor
OK
1672628340.269910 [0 10.0.200.237:40376] "ping"
1672628340.273033 [0 10.0.200.237:40378] "ping"
1672628340.279173 [0 10.0.200.237:40378] "restore" "data" "0" "\x00\n2022-12-29\x06\x00\xbcy\x10\xe2\x9e\xa7n6" "replace"
1672628340.279195 [0 10.0.200.237:40378] "restore" "key2" "0" "\x00\xc1\xde\x00\x06\x00\x81\xc2\x87p\x9a\xce\x9a|" "replace"

[ec2-user@ip-10-0-200-237 ~]$ redis-cli -h cluster-0002-001.5t9cps.0001.usw2.cache.amazonaws.com monitor
OK
1672628340.274942 [0 10.0.200.237:36506] "ping"
1672628340.279042 [0 10.0.200.237:36506] "restore" "key" "0" "\x00\xc0o\x06\x00Y#MBJ\xf4\xc3g" "replace"

使用 Redis-Shake 來 “線上搬遷(Online migration)”

Redis-Shake 可以使用 psync/sync 的方式,線上持續的搬數據到目標端的 ElastiCache。 !!! 但請注意 psync/sync 會對 “來源端的 Redis” 的效能造成影響,若您執行 Redis-Shake 的 EC2 實例效能不足時,也有可能持續對 “來源端的 Redis” 發起 “全同步(full sync replication)的請求,進而造成 “來源端的 Redis” 出現異常。

!!! 也就是因為這個原故,所以 psync/sync 在 ElastiCache Redis 中,是一個受限制 (Restricted Redis Commands) 的命令,是無法使用的。

單一個主節點 ==> 到目的地標 ElastiCache 非集群模式 (cluster mode disable)。

此時 Redis-Shake 的 EC2 實例,會是 “來源端的 Redis” 的副本節點(replica node),會向 “來源端的 Redis” 發出 psync 的要求,此時由於是第一次同步作業, “來源端的 Redis” 的主節點拒絕非同步請用,進而執行 “全同步” 作業。同步完成,此時 Redis-Shake 的 EC2 實例就是一台副本節點(replica node)的角色。而 Redis-Shake 會使用非同步的方式,將數據使用 restore 命令,持續更新至 “目標端 ElastiCache Redis Cluster”。

$ vi sync.toml
---
type = "sync" # <----!!!!

[source]
version = 6.2 # redis version, such as 2.8, 4.0, 5.0, 6.0, 6.2, 7.0, ...
address = "127.0.0.1:6379" # <----!!!!
username = "" # keep empty if not using ACL
password = "" # keep empty if no authentication is required
tls = false
elasticache_psync = "" # using when source is ElastiCache. ref: https://github.com/alibaba/RedisShake/issues/373

[target]
type = "standalone" # "standalone" or "cluster"
version = 6.2 # redis version, such as 2.8, 4.0, 5.0, 6.0, 6.2, 7.0, ...
# When the target is a cluster, write the address of one of the nodes.
# redis-shake will obtain other nodes through the `cluster nodes` command.
# address = "127.0.0.1:6380"
address = "single-001.5t9cps.0001.usw2.cache.amazonaws.com:6379" # <----!!!!
username = "" # keep empty if not using ACL
password = "" # keep empty if no authentication is required
tls = false
----

$ pwd
/home/ec2-user/RedisShake
$ ./bin/redis-shake sync.toml
---
2023-01-02 03:09:04 INF GOOS: linux, GOARCH: arm64
2023-01-02 03:09:04 INF Ncpu: 4, GOMAXPROCS: 4
2023-01-02 03:09:04 INF pid: 12912
2023-01-02 03:09:04 INF pprof_port: 0
2023-01-02 03:09:04 INF No lua file specified, will not filter any cmd.
2023-01-02 03:09:04 INF no password. address=[single-001.5t9cps.0001.usw2.cache.amazonaws.com:6379]
2023-01-02 03:09:04 INF redisWriter connected to redis successful. address=[single-001.5t9cps.0001.usw2.cache.amazonaws.com:6379]
2023-01-02 03:09:04 INF no password. address=[127.0.0.1:6379]
2023-01-02 03:09:04 INF psyncReader connected to redis successful. address=[127.0.0.1:6379]
2023-01-02 03:09:04 WRN remove file. filename=[486.aof]
2023-01-02 03:09:04 WRN remove file. filename=[dump.rdb]
2023-01-02 03:09:04 INF start save RDB. address=[127.0.0.1:6379]
2023-01-02 03:09:04 INF send [replconf listening-port 10007]
2023-01-02 03:09:04 INF send [PSYNC ? -1]
2023-01-02 03:09:04 INF receive [FULLRESYNC 1dff6ec8b865bd15926c307ed18430bf22676906 0]
2023-01-02 03:09:04 INF source db is doing bgsave. address=[127.0.0.1:6379]
2023-01-02 03:09:04 INF source db bgsave finished. timeUsed=[0.06]s, address=[127.0.0.1:6379]
2023-01-02 03:09:04 INF received rdb length. length=[215]
2023-01-02 03:09:04 INF create dump.rdb file. filename_path=[dump.rdb]
2023-01-02 03:09:04 INF save RDB finished. address=[127.0.0.1:6379], total_bytes=[215]
2023-01-02 03:09:04 INF start send RDB. address=[127.0.0.1:6379]
2023-01-02 03:09:04 INF RDB version: 9
2023-01-02 03:09:04 INF RDB AUX fields. key=[redis-ver], value=[6.2.6]
2023-01-02 03:09:04 INF start save AOF. address=[127.0.0.1:6379]
2023-01-02 03:09:04 INF AOFWriter open file. filename=[0.aof]
2023-01-02 03:09:04 INF RDB AUX fields. key=[redis-bits], value=[64]
2023-01-02 03:09:04 INF RDB AUX fields. key=[ctime], value=[1672628944]
2023-01-02 03:09:04 INF RDB AUX fields. key=[used-mem], value=[1976648]
2023-01-02 03:09:04 INF RDB repl-stream-db: 0
2023-01-02 03:09:04 INF RDB AUX fields. key=[repl-id], value=[1dff6ec8b865bd15926c307ed18430bf22676906]
2023-01-02 03:09:04 INF RDB AUX fields. key=[repl-offset], value=[0]
2023-01-02 03:09:04 INF RDB AUX fields. key=[aof-preamble], value=[0]
2023-01-02 03:09:04 INF RDB resize db. db_size=[4], expire_size=[0]
2023-01-02 03:09:04 INF send RDB finished. address=[127.0.0.1:6379], repl-stream-db=[0]
2023-01-02 03:09:05 INF AOFReader open file. aof_filename=[0.aof]
2023-01-02 03:09:09 INF syncing aof. allowOps=[0.80], disallowOps=[0.00], entryId=[3], InQueueEntriesCount=[0], unansweredBytesCount=[0]bytes, diff=[0], aofReceivedOffset=[0], aofAppliedOffset=[0]
2023-01-02 03:09:14 INF syncing aof. allowOps=[0.20], disallowOps=[0.00], entryId=[4], InQueueEntriesCount=[0], unansweredBytesCount=[0]bytes, diff=[0], aofReceivedOffset=[14], aofAppliedOffset=[14]
2023-01-02 03:09:19 INF syncing aof. allowOps=[0.00], disallowOps=[0.00], entryId=[4], InQueueEntriesCount=[0], unansweredBytesCount=[0]bytes, diff=[0], aofReceivedOffset=[14], aofAppliedOffset=[14]
2023-01-02 03:09:24 INF syncing aof. allowOps=[0.20], disallowOps=[0.00], entryId=[5], InQueueEntriesCount=[0], unansweredBytesCount=[0]bytes, diff=[0], aofReceivedOffset=[28], aofAppliedOffset=[28]
---
>> 持續執行的狀態

127.0.0.1:6379> monitor
OK
1672628944.899292 [0 127.0.0.1:57232] "ping"
127.0.0.1:6379> set key4 444
OK

$ redis-cli -h single-001.5t9cps.0001.usw2.cache.amazonaws.com monitor
OK
1672629315.946793 [0 10.0.200.237:37452] "ping"
1672629316.043963 [0 10.0.200.237:37452] "restore" "key1" "0" "\x00\xc0o\x06\x00Y#MBJ\xf4\xc3g" "replace"
1672629316.043976 [0 10.0.200.237:37452] "restore" "key2" "0" "\x00\xc1\xde\x00\x06\x00\x81\xc2\x87p\x9a\xce\x9a|" "replace"
1672629316.043985 [0 10.0.200.237:37452] "restore" "key3" "0" "\x00\xc1M\x01\x06\x00R\xe8F\xce!\xbb}\xf3" "replace"
1672629328.848398 [0 10.0.200.237:37452] "ping"
1672629334.311775 [0 10.0.200.237:37452] "set" "key4" "444"


127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=127.0.0.1,port=10007,state=online,offset=181,lag=0
master_failover_state:no-failover
>> RedisShake 變成一個 replica node 去跟主節點同步。

### redis.log ###
---
12959:M 02 Jan 2023 03:14:48.672 * Ready to accept connections
12959:M 02 Jan 2023 03:15:15.948 * Replica 127.0.0.1:10007 asks for synchronization
12959:M 02 Jan 2023 03:15:15.948 * Full resync requested by replica 127.0.0.1:10007
12959:M 02 Jan 2023 03:15:15.948 * Replication backlog created, my new replication IDs are '6c161ba71f42f0031780f868f8688455d583677b' and '0000000000000000000000000000000000000000'
12959:M 02 Jan 2023 03:15:15.948 * Starting BGSAVE for SYNC with target: disk
12959:M 02 Jan 2023 03:15:15.948 * Background saving started by pid 12974
12974:C 02 Jan 2023 03:15:15.950 * DB saved on disk
12974:C 02 Jan 2023 03:15:15.951 * RDB: 0 MB of memory used by copy-on-write
12959:M 02 Jan 2023 03:15:16.040 * Background saving terminated with success
12959:M 02 Jan 2023 03:15:16.041 * Synchronization with replica 127.0.0.1:10007 succeeded
....
12959:M 02 Jan 2023 03:17:46.611 # Connection with replica 127.0.0.1:10007 lost.
---
>> RedisShake 對主節點發起 psync/sync 命令 " tarting BGSAVE for SYNC with target: disk",此時 RedisShake 是一個同節點的。

單一個主節點 ==> 到目的地標 ElastiCache 集群模式 (cluster mode enable)。

此時 Redis-Shake 的 EC2 實例,會是 “來源端的 Redis” 的副本節點(replica node),會向 “來源端的 Redis” 發出 psync 的要求,此時由於是第一次同步作業, “來源端的 Redis” 的主節點拒絕非同步請用,進而執行 “全同步” 作業。同步完成,此時 Redis-Shake 的 EC2 實例就是一台副本節點(replica node)的角色。而 Redis-Shake 會對 “主節點” 來建立連線,再去使用非同步的方式,將數據使用 restore 命令,持續更新至 “目標端 ElastiCache Redis Cluster”。

$ vi sync.toml
---
type = "sync" # <----!!!!

[source]
version = 6.2 # redis version, such as 2.8, 4.0, 5.0, 6.0, 6.2, 7.0, ...
address = "127.0.0.1:6379" # <----!!!!
username = "" # keep empty if not using ACL
password = "" # keep empty if no authentication is required
tls = false
elasticache_psync = "" # using when source is ElastiCache. ref: https://github.com/alibaba/RedisShake/issues/373

[target]
type = "cluster" # "standalone" or "cluster"
version = 6.2 # redis version, such as 2.8, 4.0, 5.0, 6.0, 6.2, 7.0, ...
# When the target is a cluster, write the address of one of the nodes.
# redis-shake will obtain other nodes through the `cluster nodes` command.
# address = "127.0.0.1:6380"
address = "cluster.5t9cps.clustercfg.usw2.cache.amazonaws.com:6379" # <----!!!!
username = "" # keep empty if not using ACL
password = "" # keep empty if no authentication is required
tls = false
---

多主節點(集群模式) ==> 到目的地標 ElastiCache集群模式 (cluster mode enable)。

方法1:手動起多個 redis-shake
把 4 個節點當成 4 個單機實例,參照 單機到集群 部署 4 個 redis-shake 進行數據同步。
方法2:借助 cluster_helper.py 啟動
腳本 cluster_helper.py 可以方便啟動多個 redis-shake 從集群遷移數據,效果等同於方法1。

### 以下是方法2:借助 cluster_helper.py 啟動 ###

$ pwd
/home/ec2-user/RedisShake/bin/cluster_helper

$ vi ../sync.toml
---
type = "sync"

[source]
version = 6.2 # redis version, such as 2.8, 4.0, 5.0, 6.0, 6.2, 7.0, ...
address = "127.0.0.1:6381" # <----!!!! 其中一個節點就可以了。
username = "" # keep empty if not using ACL
password = "" # keep empty if no authentication is required
tls = false
elasticache_psync = "" # using when source is ElastiCache. ref: https://github.com/alibaba/RedisShake/issues/373

[target]
type = "cluster" # "standalone" or "cluster"
version = 6.2 # redis version, such as 2.8, 4.0, 5.0, 6.0, 6.2, 7.0, ...
# When the target is a cluster, write the address of one of the nodes.
# redis-shake will obtain other nodes through the `cluster nodes` command.
address = "cluster.5t9cps.clustercfg.usw2.cache.amazonaws.com:6379" # <----!!!! 直接放終端位置就行了。
username = "" # keep empty if not using ACL
password = "" # keep empty if no authentication is required
tls = false
---

$ python3 cluster_helper.py ../redis-shake ../sync.toml
Traceback (most recent call last):
File "cluster_helper.py", line 11, in <module>
import redis
ModuleNotFoundError: No module named 'redis'
$ pip3 install -i https://pypi.douban.com/simple/ redis


$ python3 cluster_helper.py ../redis-shake ../sync.toml
Traceback (most recent call last):
File "cluster_helper.py", line 12, in <module>
import requests
ModuleNotFoundError: No module named 'requests'
$ pip3 install requests


$ python3 cluster_helper.py ../redis-shake ../sync.toml
Traceback (most recent call last):
File "cluster_helper.py", line 13, in <module>
import toml
ModuleNotFoundError: No module named 'toml'
$ pip3 install toml

$ python3 cluster_helper.py ../redis-shake ../sync.toml
...
leep 3 seconds to wait redis-shake start
================ 2022-12-29 10:36:36 ================
127.0.0.1:6381
127.0.0.1:6382
127.0.0.1:6383
================ 2022-12-29 10:36:41 ================
127.0.0.1:6381 syncing aof. allowOps=[0.40], disallowOps=[0.00], entryId=[1], InQueueEntriesCount=[0], unansweredBytesCount=[0]bytes, diff=[0], aofReceivedOffset=[0], aofAppliedOffset=[0]
127.0.0.1:6382 syncing aof. allowOps=[0.40], disallowOps=[0.00], entryId=[1], InQueueEntriesCount=[0], unansweredBytesCount=[0]bytes, diff=[0], aofReceivedOffset=[0], aofAppliedOffset=[0]
127.0.0.1:6383 syncing aof. allowOps=[0.40], disallowOps=[0.00], entryId=[1], InQueueEntriesCount=[0], unansweredBytesCount=[0]bytes, diff=[0], aofReceivedOffset=[0], aofAppliedOffset=[0]
================ 2022-12-29 10:36:46 ================
127.0.0.1:6381 syncing aof. allowOps=[0.20], disallowOps=[0.00], entryId=[2], InQueueEntriesCount=[0], unansweredBytesCount=[0]bytes, diff=[0], aofReceivedOffset=[2131], aofAppliedOffset=[2131]
127.0.0.1:6382 syncing aof. allowOps=[0.20], disallowOps=[0.00], entryId=[2], InQueueEntriesCount=[0], unansweredBytesCount=[0]bytes, diff=[0], aofReceivedOffset=[2099], aofAppliedOffset=[2099]
127.0.0.1:6383 syncing aof. allowOps=[0.20], disallowOps=[0.00], entryId=[2], InQueueEntriesCount=[0], unansweredBytesCount=[0]bytes, diff=[0], aofReceivedOffset=[2160], aofAppliedOffset=[2160]
================ 2022-12-29 10:36:51 ================
127.0.0.1:6381 syncing aof. allowOps=[0.00], disallowOps=[0.00], entryId=[2], InQueueEntriesCount=[0], unansweredBytesCount=[0]bytes, diff=[0], aofReceivedOffset=[2131], aofAppliedOffset=[2131]
127.0.0.1:6382 syncing aof. allowOps=[0.00], disallowOps=[0.00], entryId=[2], InQueueEntriesCount=[0], unansweredBytesCount=[0]bytes, diff=[0], aofReceivedOffset=[2099], aofAppliedOffset=[2099]
127.0.0.1:6383 syncing aof. allowOps=[0.00], disallowOps=[0.00], entryId=[2], InQueueEntriesCount=[0], unansweredBytesCount=[0]bytes, diff=[0], aofReceivedOffset=[2160], aofAppliedOffset=[2160]
...
You pressed Ctrl+C!
Waiting for process 29774 to exit...
process 29774 exited.
Waiting for process 29780 to exit...
process 29780 exited.
Waiting for process 29781 to exit...
process 29781 exited.
>> 持續執行,運作方式同上。

用 > info Replication 可以觀察到 RedisShake 的一個從節點的角色,來連接主節點。

$ redis-cli -h 127.0.0.1 -p 6381 -c
127.0.0.1:6381> info Replication
# Replication
role:master
connected_slaves:2
slave0:ip=127.0.0.1,port=6384,state=online,offset=1318,lag=1
slave1:ip=127.0.0.1,port=10007,state=online,offset=1318,lag=0 <---- !!!!
master_failover_state:no-failover

127.0.0.1:6381> cluster nodes
425ea349d087ea6ea8f74c491ecbd6bece7b3c9c 127.0.0.1:6382@16382 master - 0 1673157589414 2 connected 5461-10922
31130856e363e07d5934b6a7a916bebe461fe707 127.0.0.1:6383@16383 master - 0 1673157590421 3 connected 10923-16383
8e53daf97be53c671da15a5a4feffb07cf5ef530 127.0.0.1:6381@16381 myself,master - 0 1673157588000 1 connected 0-5460
f8280ebf70fdccdaba8071eaeec2e7de80d799ec 127.0.0.1:6384@16384 slave 8e53daf97be53c671da15a5a4feffb07cf5ef530 0 1673157587397 1 connected
ffcdeb5e8810a81814f49667479e76e6b2555609 127.0.0.1:6386@16386 slave 31130856e363e07d5934b6a7a916bebe461fe707 0 1673157589000 3 connected
4faa90c3497fa3855e22a5d193f67ab754109559 127.0.0.1:6385@16385 slave 425ea349d087ea6ea8f74c491ecbd6bece7b3c9c 0 1673157589000 2 connected

如何跨內外網,來同步數據?

由於 ElastiCache 不提供外網連接,若您的外部的 Redis cluster 也在在其他雲廠商、或是地端 on-premise 的內網中時,該由如何處理??

  1. 如果來源(source Redis)端,在相同內網的話(相同 VPC/Subnet),就直接使用。
  2. 如果來源(source Redis)端,在不同 VPC 的話,使用 VPC peering。
  3. 如果來源(source Redis)端,在外網的話,放一台實例跑 RedisShake,並放在與 ElastiCache 相同的 VPC,但是在外網的網段,走 NAT or IGW
  4. 如果來源(source Redis)端,在其他雲廠商、或是地端 on-premise 的內網中,此時有幾個做法。

* 使用 AWS VPN 來建立穩定的連接通道。!!!優先建議!!!

[+] AWS VPN — 雲端 VPN — Amazon Web Services: https://aws.amazon.com/tw/vpn/

* 使用 ssh tunnel 來建立臨時通道。!!!暫時測試使用!!!

先在來源(source redis)端,建立一台在來源端的 Bastion 機,該機器要在外網,並且能夠連接來源(source redis)端機器。
在目端(Target redis)端,也建立一台實例跑 RedisShake,並放在與 ElastiCache 相同的 VPC,但是在外網的網段,走 NAT or IGW。
接下來使用 ssh tunnel 來打通 “來源端的 Bastion 機” ← “實例跑 RedisShake 機” 之間的溝通。RedisShake 走這一個通道來傳輸數據。

Q: 如何透過 ssh tunnel 從外網來連接內網的 ElastiCache?

您可以透過使用 ssh tunnel 的方式先連到與 ElastiCache 相同 VPC 的 EC2 實例後,之後再連到 ElastiCache。

1. 先在本地端建立 ssh tunnel:
$ ssh -i EC2_Token.pem -N -L 26379:ElastiCache_endpoint:6379 ec2_user@ec2_host
[+] How to Setup Bastion Server with AWS EC2 https://medium.com/codex/how-to-setup-bastion-server-with-aws-ec2-b1590d2ff815
!!! Windows 也可以使用 putty 連建立: [+] Configure SSH Tunneling with Windows Using PuTTY: https://docs.marklogic.com/cloudservices/aws/admin/configure-putty-tunneling.html
2. 本地端建立連線,當您連到 localhost 的 26379 時,便可以連到 ElastiCache 的 6379 port。

!!! 注意 !!!

ssh tunnel 的方式,不並適用在 `正式環境` 讓外網連接,因為在集群模式(Cluster mode enable)下,有多個主節點時,就不適用了。 此外,當 Redis 節點發生故障時,Elasticache Redis 會主動執行主從切換(failover),此時終端位置(Endpoint),所解析出來的IP會改變,那麼您原本使用 ssh tunnel 的方式,連到的舊節點IP,就會出現連接的問題(單點故障),所以這個方案只適合 `測試/臨時` 使用,最佳的使用方式,還是在 `內網 VPC 來連接,或是打通 VPN/DX` 來連接。

!!! 最主要的原因還是,Redis本身設定就是要 `快速存取、低延遲`,若是在 `外網連接` 使用的話,會 `加上網路本身的延遲`,此時就失去了 Redis 服務本身最重要的義意了,就不如使用傳統常見的數據庫,例如 RDS/Mysql。

--

--

Jerry’s Notes
What’s next?

An cloud support engineer focus on troubleshooting with customer reported issue ,and cloud solution architecture.