Redis Replication 同步如何執行?

Redis 使用非同步的方式,從主節點上複制資料到從節點上,而同步作業是由 “從節點” 發起,當新的 “從節點 (replica node)”加入到 Redis cluster 後,或是 replica node reconnected to primary node 時,從節點會拿 replication-id 及 offset 向主節點(primary node)發起 psync ”增量/部份同步”,此時主節點會比對 replication-id是否一致,若不一致 ,則會 reject psync,而執行full-sync”全同步”,若 replication-id 一致,則會比對 offset 資料是否在 backlog buffer 中,若 offset 資料在 backlog buffer 中則執行 psync,反之 reject psync 而執行full-sync。

Jerry’s Notes
What’s next?
6 min readMar 18, 2022

--

Each replica node maintains a copy of the data from the primary node. Replica nodes use asynchronous replication mechanisms to keep synchronized with the primary node. Applications can read from any node in the cluster but can write only to primary nodes. Read replicas enhance scalability by spreading reads across multiple endpoints. Read replicas also improve fault tolerance by maintaining multiple copies of the data. Locating read replicas in multiple Availability Zones further improves fault tolerance.

Q: Replication how to work?

When a master and replica nodes are well connected, primary node will continue sent the commands to replica node. However, for some reasons if the primary or replica node faces any connection/performance issue then it may fall in replication lag state. In that case replica node reconnects to the primary node and ask for the partial sync.

If partial resynchronization is not possible, the replica node will ask for a full resynchronization. This will involve a more complex process in which the primary needs to create a snapshot of all its data, send it to the replica node, and then continue sending the stream of commands as the dataset changes. It will used extra CPU loading and memory to perform resynchronization.

ElastiCache Redis 官方文檔

在非叢集模式(Cluster mode disabled)下,多個節點的 Redis 集群,會有一個主節點(Primary node),主要負責寫入命令的操作,以及最多5個只讀副本節點(Read-Only Replica nodes),每個僅供讀取複本都會保有從叢集主要節點取得的資料複本(Replication)。而主從之間,會使用非同步複寫機制,將主節點上的數據複寫到從節點,以保持數據的一致性。而您的前端應用程式可從叢集內的任何節點進行讀取,但您的應用程式只能在主要節點做寫入的命令操作。

而只讀副本節點(從節點),除了可以提供您前端應用程式做”讀取”命令操作的請求,來分散主節點的負載外,更可以在主節點出現問題時,後台系統自動將主節點的角色,切換到其他副本節點,以減少您業務受到影響的時間。

[+] 了解 Redis 複寫 — Redis (叢集模式停用) :
https://docs.aws.amazon.com/zh_tw/AmazonElastiCache/latest/red-ug/Replication.Redis.Groups.html#Replication.Redis.Groups.Classic

Q: 什麼是運行 ID(runid)?

Replication ID: Every Redis master has a replication ID: it is a large pseudo random string that marks a given story of the dataset.

每個 Redis server 都會有自己的運行 ID (Replication ID),由 40 個隨機的十六進制字符組成。當 slave 初次復制 master 時,master 會將自己的運行 ID 發給 slave 進行保存,這樣 slave 重連時再將這個運行 ID 發送給重連上的 master ,master 會接受這個 ID 並與自身的運行 ID 比較進而判斷是否是同一個 master。

Q: 什麼是Replication id and offset?

A replication ID basically marks a given history of the data set. The replicas connected to a master will inherit its replication ID after the handshake. 從節點會在 handshake 後,從主節點繼承 replication ID。

The primary and read replica each maintains a record (offset) representing the amount of data written to the node. 主從各自維護 record (offset) 記錄。

offset works as a logical time to understand, for a given history (replication ID) who holds the most updated data set. — offset記錄的偏移量,用於判斷之主從之間的數據差。

Q: What is Replication backlog?

The backlog is a 1MB ring buffer that accumulates recent data in the primary node when replicas are disconnected for some time, so that when a replica wants to reconnect again, often a full resync is not needed, but a partial resync is enough.In ElastiCache, the backlog size is 1MB by default and can be controlled by repl-backlog-size.

backlog buffer only for replication

backlog buffer 是一個環形緩沖區,整個primary進程中只會存在一個,所有的 replica 公用。backlog的大小通過repl-backlog-size參數設置,默認大小是1M。在主節點上,是一個固定大小,先進先出的Queue(FIFO)。其大小可以根據每秒產生的命令 乘以((master執行rdb bgsave的時間)+ (master發送rdb到slave的時間) + (slave load rdb文件的時間) ) ,來估算積壓緩沖區的大小,repl-backlog-size值不小於這兩者的乘積。

The size, in bytes, of the primary node backlog buffer. The backlog is used for recording updates to data at the primary node. When a read replica connects to the primary, it attempts to perform a partial sync (psync), where it applies data from the backlog to catch up with the primary node. If the psync , then a full sync is required.

In parameter group:

repl-backlog-size: default 1MB

repl-backlog-ttl: defaul 3600s

Q: What is repl-timeout?

以下三種情況認為複製超時:

1)replica 角度,如果在 repl-timeout 時間內沒有收到 master SYNC 傳輸的 rdb snapshot 數據。

2)replica 角度,在 repl-timeout 沒有收到 master 發送的數據包或者 ping。

3)primary 角度,在 repl-timeout 時間沒有收到 REPCONF ACK 確認信息。

Q: sync/psync 的差異?

Sync > 1.0: 是一個舊協議,在新的 Redis 實例中已經不再被使用,但是其仍然向後兼容:但它不允許部分重同步。

Psync >= 2.8.0:支持full resync和partial resync命令,現在 PSYNC被用來替代 SYNC。

runid=replicaid: 節點運行ID,由40個隨機的16進位字串所組成。

offset: 偏移量。

Replica to Primary

第一次執行同步: > psync ? -1

第二次以後要求同步: > psync <runid> <offset>

Primary to Replica

執行全同步: +FULLRESYNC <runid> <offset>

執行部份同步: +CONTINU ?

Psync 2.0 — redis 4+:優化前一個版本,當主從切換後,必然執行全同步的行為。

Redis 2.8以前採用的複製都為全量複製,使用SYNC命令全量同步複製,SYNC存在很大的缺陷就是:不管slave是第一次啓動,還是連接斷開後的重連,主從同步都是全量數據複製,嚴重消耗master的資源以及大量的網絡連接資源。Redis在2.8及以上版本使用PSYNC命令完成主從數據同步,PSYNC同步過程分為全量複製和部分複製,完善了SYNC存在的缺陷。

Reids主從同步(Replication)複製數據主要有2種場景

1.從服務器從來第一次和當前主服務器連接,即初次複製,不管是SYNC 還是 PSYNC第一次都是全量同步複製數據。

2.從服務器斷線後重新和之前連接的主服務器恢復連接,即斷線後重複製,SYNC使用的是全量複製,PSYNC使用的是增量複製。

Full Synchronization

A fork or forkless snapshot (RDB file) is taken on the master node. The snapshot is then sent to the read replica node.

A fork or forkless snapshot is taken, but data is not written to the RDB file on the file system, data is send to the socket file descriptor which connects to the read replica node.

The full synchronization process is essentially the same as the snapshot process.

Replication ID: it is a large pseudo random string that marks a given story of the dataset.

Offset: it increments for every byte of replication stream that it is produced to be sent to replicas, in order to update the state of the replicas with the new changes modifying the dataset. master_repl_offset: The server’s current replication offset. second_repl_offset: The offset up to which replication IDs are accepted.

Partial Synchronization

The master and read replica each maintains a record (offset) representing the amount of data written to the node.

Master maintains a circular buffer(backlog) which stores recent changes to the master (repl-backlog-size by default 1MB)

When read replica disconnects and reconnects, master compare the offset value on master and replica to determine whether it sends the data in the circular buffer to the read replica (partial sync), or does a full sync.

The backlog is used for recording updates to data at the primary node. When a read replica connects to the primary, it attempts to perform a partial sync (psync), where it applies data from the backlog to catch up with the primary node. If the psync fails, then a full sync is required.

Q: 同步作業(Replication),會有什麼效能影響?

A: 當主從之間數據差異(offset)過大於,或是新的節點(new replica node)加入Redis Cluster時,因為需要執行全同步(Full Synchronization),所以需要將所有內存內的數據,同步至只讀副本的從節點(replica node),所以會造成許多影響,例如 EngineCPU 增加、內存使用量增加(Slave output buffer/SWAPUsage)、網路進出流量(Network traffic)增加等等。

Q: 什麼是復制風暴 (Replication storm)?

A: 當數據不一致時,從節點(replcia node)要求同步作業(Replication),但因為主節點此時處高負載狀態,導致全量同步失敗,而從節點(replcia node)又重新開始要求同步,之後又同步失敗,惡性循環重複執行所造成的問題。

指令操作

Step1: 手動將從節點,切成主節點。
127.0.0.1:6391> slaveof no one
OK
127.0.0.1:6391> config set slave-read-only no
OK
Step2: 手動將從節點,切成從節點,並跟新的主節點同步。
> slaveof 主節點IP 主節點Port。
127.0.0.1:6381> slaveof 127.0.0.1 6391
OK
127.0.0.1:6381> config set slave-read-only yes
OK
127.0.0.1:6381> info replication
# Replication
role:slave <---- 從這可以觀察該節點的角色。
master_host:127.0.0.1 <---- 主節點的IP位置。
master_port:6391
master_link_status:up <---- 主節點的狀態。

connected_slaves:1
slave0:ip=127.0.0.1,port=6392,state=online,offset=448,lag=0
master_failover_state:no-failover

> Redis Slaveof 命令可以將當前服務器轉變為指定服務器的從屬服務器(slave server)。
!!! 如果當前服務器已經是某個主服務器(master server)的從屬服務器,那麼執行 SLAVEOF host port 將使當前服務器停止對舊主服務器的同步,丟棄舊數據集,轉而開始對新主服務器進行同步

redis.log 範例

!!! ElastiCache Redis 是不充許其他自建的 Redis 節點,成為自已的 replica node。

$ redis-cli -h 127.0.0.1
127.0.0.1:6379> slaveof xxx.usw2.cache.amazonaws.com 6379
OK
127.0.0.1:6379> info replication
# Replication
role:slave <---- 從這可以觀察該節點的角色。
master_host:xxx.usw2.cache.amazonaws.com
master_port:6379
master_link_status:down <--- 連接狀況是失敗的。
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_repl_offset:1
$ cat redis.log
...
32284:S 04 Aug 2021 06:40:32.103 * MASTER <-> REPLICA sync started
32284:S 04 Aug 2021 06:40:32.103 * Non blocking connect for SYNC fired the event.
32284:S 04 Aug 2021 06:40:32.104 * Master replied to PING, replication can continue...
32284:S 04 Aug 2021 06:40:32.104 * Partial resynchronization not possible (no cached master)
32284:S 04 Aug 2021 06:40:32.105 * Master does not support PSYNC or is in error state (reply: -ERR unknown command `PSYNC`, with args beginning with: `?`, `-1`, )
32284:S 04 Aug 2021 06:40:32.105 * Retrying with SYNC...
32284:S 04 Aug 2021 06:40:32.105 # MASTER aborted replication with an error: ERR unknown command `SYNC`, with args beginning
...
!!! 這個範例是,自建的 Redis 嘗試跟 ElastiCache Redis 去要求同步作業(psync/sync),但被拒絕的狀況。!!! 因為 Amazon ElastiCache 是 AWS 全託管的服務,所以 ElastiCache 限制了對某些需要高級特權的特定於緩存引擎的命令。故執行 sync/psync 出現失敗。

在自建 Redis 上做 sync/psync 命令測試。

Step1: 調整配置檔
$ cat redis-master.conf
---
# daemonize no 改為yes,開啟後台運行,默認是前台運行
daemonize yes

# Slave Port
#port 6379
port 6380
---

$ cat redis-slave.conf
---
# daemonize no 改為yes,開啟後台運行,默認是前台運行
daemonize yes

# Slave Port
#port 6379
port 6381
---

Step2: 個別啟動服務
$ ./src/redis-server redis-master.conf
$ ./src/redis-server redis-slave.conf

Step3: 在 `從節點(replica node)` 指定主節點的位置。
$ redis-cli -p 6381
127.0.0.1:6381> replicaof 127.0.0.1 6380
OK

Step4: 檢查 replication 的狀態。
127.0.0.1:6381> info replication
# Replication
role:slave <----- !!!
master_host:127.0.0.1 <----- !!!
master_port:6380 <----- !!!
master_link_status:up
master_last_io_seconds_ago:2
master_sync_in_progress:0
slave_read_repl_offset:0
slave_repl_offset:0
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:b967e55fb2fbcd2de0452b0900e4d482f36bea53 <----- !!!
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0 <----- !!!
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:0

Step5: 執行 sync/psync 命令
127.0.0.1:6381> sync
Entering replica output mode... (press Ctrl-C to quit)
Full resync with master, discarding 171 bytes of bulk transfer...
Full resync done. Logging commands from master.
"ping"

127.0.0.1:6381> psync b967e55fb2fbcd2de0452b0900e4d482f36bea53 0
Entering replica output mode... (press Ctrl-C to quit)
PSYNC replied +FULLRESYNC b967e55fb2fbcd2de0452b0900e4d482f36bea53 280
Full resync with master, discarding 172 bytes of bulk transfer...
Full resync done. Logging commands from master.
""ping"
"ping"
"ping"
"ping"
"ping"
"ping"
Error: Server closed the connection
(60.70s)$ redis-cli -h 127.0.0.1
127.0.0.1:6379> slaveof xxx.usw2.cache.amazonaws.com 6379
OK

--

--

Jerry’s Notes
What’s next?

An cloud support engineer focus on troubleshooting with customer reported issue ,and cloud solution architecture.