Blockchain 的 BFT-style POS

fcamel

Published in

fcamel的程式開發心得

17 min readJan 13, 2019

前篇介紹了前置知識，這篇接著說明主題。我只有大致閱讀來源資料，沒讀完整也沒讀深。往後有更深理解後，再來補充。

新手上路，有錯還請指正。

BFT-style POS

BFT-style POS 的步驟和 PBFT 一樣。以《DPOS BFT — Pipelined Byzantine Fault Tolerance》為主輔以其它資料，說明如下。

Blockchain 的參與者要提供擔保金才能參加 proposers 和 validators 的選舉。選上後會扣押擔保金。選好 proposers 和 validators 後，依序作:

Pre-prepare: proposer 打包選定的 transactions 到新的區塊裡，簽名後送給 validators。
Prepare: validators 收到 proposal，驗證通過後，簽名同意票 (prepare message)，然後廣播。驗證不過的話，不作任何事。
Commit: validators 收到 2/3 validators 的 prepare messages 後，廣播 commit messages。
收到 2/3 commit message 後，定案此提案 (finalized)。

不同 consensus algorithm 的差異在於:

決定何時由誰提案。注意「何時」需要考慮大家的時間可能無法同步。像 PaLa 的參與者使用 local epoch，使用虛擬的時間。藉由廣播的方式對時，比方說收到 2/3 的人說調整時間 e → e+1，才會更新 local epoch。
如何傳遞和記錄 commit。
如何偵測作壞事 (不當提案、不當投票)。
偵測到作壞事後，處罰的規則。例如沒收參與 validator 時提供的保證金。

另外值得一提的是，和 chain-based POS 相比: chain-based POS 較重視 liveness，而 BFT-style POS 較重視 consistency。原文如下:

Proof of work algorithms and chain-based proof of stake algorithms choose availability over consistency, but BFT-style consensus algorithms lean more toward consistency

先接觸 chain-based 的方法後，一開始覺得 BFT-style POS 好像不夠去中心化。不過想要提升效率 (throughput 和 latency)，必須有適度的中心化。所以，若 validators 的選舉方式夠去中心化，似乎是不錯的平衡。就像現在的民主制度也是代議制度，不是什麼提案都要全民公投。

Pipelined BFT-style POS

BFT-style POS 裡的三個步驟 proposal、prepare、committ 是在一個回合內作完，才會開始下回合。若三者獨立持續進行，不用互等，可提升 throughput。不過天下沒有免費的午餐，將流程 pipeline 後，finality 的條件會變複雜。後面介紹 Ethereum 的 Casper FFG 時，可以體會這點。

後面介紹幾個屬於此類的 consensus algorithm。

Ethereum 的 Casper the Friendly Finality Gadget

Ethereum 目前用 POW。POW 或 chain-based POS 沒有 finality，遇到 51% attack 時，最長鏈會被翻盤。這是 Casper FFG 想先處理的問題。

Casper FFG (Friendly Finality Gadget) 是相對獨立、架在現在 Ethereum 之上的演算法。每 100 個 block (包含第 0 個 block) 作為 checkpoint (paper 用 100，有在其他討論看到 50)，形成 checkpoint tree。想成為 validator 的用戶先 deposit cryptocurrency 成為 validator，藉由 validator finalize checkpoint，從而保護 51% attack 。

為什麼 Casper FFG 有防範到 51% attack？因為 validator 有放擔保金 (stake) ，作壞事抓包會被罰很重。此外，validator 的利益和 blockchain 的安全性正相關 (錢夠多才能當 validator)，所以 validator 比較沒有動機作壞事。反之，51% attack 的攻擊者是礦工，他們作壞事不會被重罰，在 blockchain 上不見得是有錢人，攻擊動機較高。

Casper FFG 的流程和 PBFT 相似，只是沒有 proposer，由 validators 各自投票。似乎沒有限定投票時限，只有限制投票規則。

投票規則

validator 投票 a →b 表示它認證 checkpoint a → checkpoint b 是有效的。
起點是 justified point; 投票 a → b 時，a 必須是 justified point。
投票 a → b 時必須滿足 h(a) < h(b)，h(x) 表示 x 在 checkpoint tree 的高度。注意這裡沒有限制 h(a) = h(b)-1。之所以放寬這個限制，推測是想讓 validator 可以有效率地一次認證一大段 checkpoints。但由此衍生出一些特殊限制，避免合法分叉。後述。
收到 2/3 (用 stake 比例算) a → b 的投票後，b 視為 justified checkpoint

h(x) 和 justified chain 示意圖，出自 https://github.com/ethereum/research/blob/master/papers/casper-basics/casper_basics.pdf

Finality 規則

若 b 是 a 的 direct child (i.e., h(b) = h(a) + 1)，則 a 視為 finalized。

這裡需要深入思考一下才能理解。在投票 a → b 的時候，其實傳達兩個意思:

投票贊成 a → b，等價於 PBFT 的 prepare message。
由於 a 必須是 justified checkpoint，表示投票者知道有 2/3 的人 justified a，換句話說，投票者有收到 justified a 的 “commit message”。

所以，若 a → b 已 justified，表示 2/3 的人收到 commit a，然後 finality 強調 a 和 b 之間不能有其它 checkpoint，才可說 a 已 finalized。

這帶出另一個疑問，為何要區分 justified 和 finalized？說明這點以前，要補充說明 Casper FFG 投票的 slashing conditions (處罰條件)。假設同一個 validator 投了 s1 → t1 和 s2 → t2:

No double vote: 不允許 h(t1) = h(t2) (這表示有 fork)。
No surround vote: 不允許 h(s1) < h(s2) < h(t2) < h(t1)，意即不允許一個投票的區間包含在另一個區間裡。

抓到違反上兩者之一的投票，會沒收擔保金。

下圖是 surround vote 的示意圖:

來源: https://docs.google.com/presentation/d/1fqnjL-2TqXjhHx8k7HRX7eUYnDK83adnlCLLH8Bk054/edit#slide=id.g29703948a2_0_3896

回來討論為何要區分 justified 和 finalized。

考慮情境一，有兩個分叉的鏈:

a0 →a2 →a4 (justified)
a0 →b1 →b3 (justified)

數字表示 h(x)，a、b 是不同條鏈。情境一是允許發生的情境。

示意圖如下:

下面的情境二不會發生 (a4、b4 違反 no double vote):

a0 →a2 →a4 (justified)
a0 →b1 →b4 (justified)

回到情境一，若加上 justified 4 → 5:

a0 →a2 →a4 →a5 (finalized)
a0 →b1 →b3 (justified)

因為

不能再投 x → 5 (no double vote)。
不會有其它路線跳過 a4 (no surround vote)

所以可以肯定 a4 已 finalized (2/3 validators 同意 4 → 5，由此推論出 2/3 看過 commit 4，且不會有新路線繞過 a4 了)。

下圖示意 Ethereum FFG 如何解決 51% attack。

出處: https://docs.google.com/presentation/d/1fqnjL-2TqXjhHx8k7HRX7eUYnDK83adnlCLLH8Bk054/edit#slide=id.g29703948a2_0_1504

圖中的意思是:

三個 block 取一次 checkpoint。
「燈炮」是 checkpoint。
黃色燈炮是 justified checkpoint。
綠色區塊是 finalized blocks。
黃色區間是主鏈。
紅色區間是 finalized 綠色區塊後發生的 51% attack。

EOS 的 DPOS BFT

EOS 似乎沒有提供論文型式的嚴謹介紹，就我看到的資料，EOS 稱他們的作法為 DPOS (Delegated POS)。

我的理解如下:

選出 21 個 proposers。
按照排好的順序以 0.5s 為間隔出塊，每人出塊 3s。因為 block time 很短，所以排定 proposer 順序的時候，要考慮網路距離要很近，減少漏區塊的機會。
每個 proposer 照 blockchain 規則從手中的最長鏈出塊。

我們先用簡化的部署討論 DPOS 背後的含意。假設有 proposer A、B、C、D 共四人，依序各出一塊。輪到 B 出塊時，B 在 A 出的塊 a1 之上繼續出塊 b1，間接地表示投 a1 一票。輪到 C 出塊時，C 的出塊代表投 a1 和 b1 各一票。然後 C 知道 a1 的塊有 3/4 參與者的票 (A、B、C)，所以 a1 的塊已 commit (目前只有 C 知道)。

待輪回 A 出塊時，A 看到 a1 → b1 → c1 → d1 得知 a1 已獲得 A ~ D 的 prepare message 和 A、C、D 的 commit message，所以 a1 已 finalized。下面是示意圖 (箭頭方面是指向 parent block; 我內文用的方向表示出塊順序)，其中 LIB 是指 last invertible block。

來源: https://medium.com/eosio/dpos-bft-pipelined-byzantine-fault-tolerance-8a0634a270ba

所以，DPOS BFT — Pipelined Byzantine Fault Tolerance 聲稱其它 BFT-style POS 的效率會被 validators 數量影響，每加一個 validator，需要多傳 2N 個訊息; 然而，DPOS BFT 沒有傳訊息，throughput 不會受限於 validators，理論上可以有 1000 個 proposers 接力，也不會影響 throughput，只會增加 finality 的 latency。

另一方面，若希望提早 finalized 的話，可仿照 PBFT 廣播訊息請其它 proposer 作為 validator 的角色，傳遞 prepare 和 commit messages，提早 finalized。

單就演算法來說，其實有作壞事的空間。假設有 A ~ U 共21個 proposers 依序出塊，B 和 C 可以聯手忽略 A 出的塊 — 只要 B 不理 A 出的塊，然後 C 接在 B 後面出塊即可。不誠實節點只要兩名就可以成功地作壞事。

EOS 有其它處理機制，讓 proposers 不會有動機作壞事。我沒看配套措施，這裡引用其中一段說明:

Governance is the process by which people in a community:
1. Reach consensus on subjective matters of collective action that cannot be captured entirely by software algorithms;
2. Carry out the decisions they reach; and
3. Alter the governance rules themselves via Constitutional amendments.

用人為機制處理演算法的缺陷，這和平常談的 consensus algorithm 作法不同 (會在演算法假設的環境下證明沒有缺陷)，實務上不易判斷優劣，畢竟市場機制不見得會選演算法最完美的版本。

ThunderToken 的 The Thunder Protocol

來源: https://www.youtube.com/watch?v=DY2qhydRK_0

主流程和 BFT-style POS 類似，比較特別的是引入 slow chain (一個符合 consensus 定義且會持續成長的 blockchain，例如 Ethereum) 處理例外狀況。理由是 PBFT 在出狀況時的處理方式過於複雜，所以改透過 blockchain 處理。簡化後的流程分為 fast path 和 slow path。

fast path 相當簡單，大致作法如下:

Accelerator 是 proposer，只有一位。
Committee 是 validators，收到 proposal 後檢查無誤 (有 Accelerator 的簽名、資料合法等)，簽名後傳回給 Accelerator 表示支持 (投票 = prepare message)。
Accelerator 收到 3/4+ 同意後，送 notarization 給 committee members ( notarization = commit message )。選用 3/4+ 可以提升 consistency 的保證，壞人比例要 3/4+ 才能破壞 consistency。
沒有明確的 finality。

以下討論幾個情境，說明如何偵測問題，例如:

若 Accelerator 故意忽略某些 transaction。trasnaction 的擁有者可以發 yell message 到 slow chain。committee members 會讀 slow chain 內容，發現 yell message 內的 transaction 在一段時間後沒被收入 proposal 的話，會要 Accelerator 下台。
slow chain 成長後，Accelerator 將它的 log 包成 heartbeat message 給 committee 簽名，簽過後再送到 slow chain。slow chain 上有數個區塊沒看到 heartbeat 的話，表示 Accelerator 或 committee 出問題了。

在 fast path 走不下去的時候，改用 slow chain 溝通。由於 slow chain 的前提是滿足 consensus 且會持續出塊，不用擔心會掉資訊。收集資訊後再看怎麼處理。前面提到 fast path 沒有處理 finality，但 committee 可以在 slow chain 同步各個委員看到的 notarization，藉此確保大家最後得到一致區塊的 notarization。

Pili / Pala

The Thunder Protocol 是基於 Thunderella 的設計，而 Thunderella 的共同作者提了新的演算法 Pili 和 Pala。下面是 Pili 和 Pala 的導讀，說明它們的共同精神，也是 BFT-style POS。了解 PBFT 的話，容易理解這段說明。

以此為基礎， PiLi 的作法很簡單 (2019/05/22 更新: 此處指 2019/01/13 的草稿版本)，摘錄論文的 Introduction:

… Informally, every epoch, an eligible proposer proposes a block (tagged with the current epoch) extending the freshest notarized chain observed so far. Nodes vote on all valid proposals from eligible proposers as long as 1) the proposed block extends from a parent chain has been notarized in the node’s view; and 2) this parent is “not too stale”. When a block gains votes from the majority of nodes, it is considered notarized but not necessarily final. If a node observes a notarized chain ending with 6 blocks of consecutive epochs and no other notarized blocks of these 6 epochs have been seen, then this notarized chain except the trailing 5 blocks are considered final.

關鍵是引入投票的限制: 收到 proposal 是由 block A 生成下個 block B時，要確保手上有 A 的 notarization 且 A 夠新，才會投票支持生成 block B。這點和 Etherem FFG 的投票 justified chain 限制相似。投票 A → B 同時傳達投票者已看過 A 的 commit message (piggyback the commit on the next block’s voting)。

而 Pala 進一步去掉有同步時間的假設，proposers 和 validators 使用 local clock，然後透過交換訊息決定是否要更新 local clock，例如收到 2/3 比例說要更新到時間 e 後，才會更新 local clock 的時間為 e。

兩篇論文都滿有趣的，再找時間細看。

(2019/01/20 更新) PaLa: A Simple Partially Synchronous Blockchain 讀後筆記

參考資料

Ethereum

EOS

ThunderToken