淺談 Byzantine Failure

Byzantine failures …

隨機的故障發生在分布式系統中，這不是我們之前提過的任何一種故障。上圖中，拜占庭錯誤可以分為兩種。
1.Primary 故障：傳送不同的結果給 backup。
2.Backup 故障：傳送跟 primary 不一樣的訊息給另一個 backup。

Arbitrary failures that occurs in distributed system. We have not told about this type of failure before. In above figure, there are two types of Byzantine failures.
1.Primary failure: sending different message to different backups.
2.Backup failure: sending the messages which is different from primary to other backups.

How to fix …

如果發生拜占庭錯誤，我們需要至少 3K + 1 個機器，才可以達成共識。
If Byzantine failure happened, we need at least 3k+1 machine to tolerance fault.

Prove

在這個兩個例子中，藍色士兵是沒有發生問題的。所以我們以他為例，他收到了兩種不同的指令，但並沒有辦法判斷哪個指令才是正確的在只有三個的形況下(3K)。
In two example, the blue soldier is normal. In the term of blue soldier, he received two commands and can not determine which one is correct in this situation(3K).

在這個例子中，藍色士兵接收到兩個 attack 跟一個 retreat，這種情況(3K+1)，它可以做出攻擊的結論。
In this case, the blue soldier received two attack and one retreat, he can make an attack conclusion in this situation (3K+1).

What the problem of its solution …

1.
在解決方法中，我們假設每個士兵都會在一定的時間內接收到訊息，這個假設在分布式中是有困難的。
In the solution, we assume that every soldier receive the command in finite time. This assumption is difficult to achieve in distributed system.

2.
難以偵錯。我們難以區分是因為延遲還是機器故障，導致沒有收到訊息。
It is difficult to detect the reason of faults that lead to be unavailable since we can not tell it is latency or failure.

下一章，我們會討論到非常著名的 CAP 理論。
Next, we will discuss the famous CAP Theorem.

-MsHe

Byzantine failures …

How to fix …

Prove

What the problem of its solution …

分享此文：

相關

發表留言 取消回覆

發表留言取消回覆