MsHe Channel

我簡單豐富的生活提案

極簡和豐富

將無用的人、事、物、時間、金錢降至最低後，才能看見生活上面重要的事情。

Minimalist and Richness

Only minimize useless things, relationships, time, and money, you can see important things in your life.

居住自由

小一點的房子，房租比較便宜；少一點的東西，心靈比較平靜。當物品太多時，這些東西將反過來影響你。那該怎麼讓自己東西越少呢？

制定自己的制服。(衣服最少化)
買經典的商品，而非限量商品。(不隨波逐流)
善用租借，而非擁有。
了解自己的需求。
知道如何脫手。

Living Freedom

The smaller house, the cheaper rent. The less things, the peaceful mind. When there are too much things in your life, they will impact you eventually. How could you minimize your things?

Find your uniform. (Minimize your clothes)
Buy the classic products, not limited. (Don’t go with the flow)
Rent it, not buy it.
Find out your needs.
Know how to sell or get rid of your things.

Feel do and cannot do, in fact, only between a read.

時間自由

時間是我們最珍貴的寶物。善用它，已經成為我們的課題。想了解時間，沒有捷徑。記錄自己的時間，問問自己是否享受這樣的生活。如果不喜歡的話，就馬上去嘗試其他活動。為自己的生活創造一點新鮮感，並且一直嘗試。這樣才能找到重要並且想要做的事情。

Time Freedom

Time is the most precious treasure. How to use it becomes the issue that everyone should think about it. If you want to know how to use it, there is no way, but recording it and asking yourself, whether you enjoy your current life. If the answer is no, try some other activities right away. Make your life more fresher and keep trying until you find the way. Exploring and trying are the best way to find out what you want to do.

Doubt is the key of knowledge.

思考自由

極簡，不應該只侷限在物品上面，更重要的是無形的東西，例如：思考。

減少選擇的項目。
知足。
質疑常識，不要全盤的接收。
查證和檢視自己的知識。
只思考最重要的事情，其他的，能給別人思考，就讓別人思考。

Think Freedom

Minimalist is not limited in physical goods. It’s more important in something mental, like thinking.

Reduce your choices.
Happiness consists in contentment.
Question the common knowledge, don’t accept it without thinking.
Verify and check everything you learned.
Only think about important things, let others to think the trivia.

To read without reflecting is like eating without digesting.

-MsHe

Case Study – Chubby

What is Chubby …

1.
高度可用的協調服務。
High available coordination service.

2.
是一種鎖，目的是為了保持系統一致性。
It is a lock service, in order to achieve consistency among distributed system.

Desired properties of Chubby …

1.高度可用。 High availability.

2.可靠性。 Reliability.

3.易於理解。(這也是為什麼是鎖服務的原因) Easy to understand. (This is why Chubby is lock service)

Components of Chubby …

1.
通常五個服務組成一個 Chubby Cell.。
Usually, there are five servers of a Chubby Cell.

2.
利用 Paxos 達成內部一致。
Using Paxos to achieve consensus in Chubby Cell.

3.
利用 Paxos 選出內部的 ChunkMaster。
Using Paxos to choose the ChunkMaster.

4.
ChunkMaster 是對外的管道。
ChunkMaster is an entrance for outside system.

Process …

1.
客戶端發送請求給最近的節點。
Client sends request to nearest node.

2.
若節點不是 ChunkMaster ，則回傳 ChunkMaster 的位置。
If the node isn’t ChunkMaster, it returns ChunkMaster’s address.

3.
客戶端直接傳送請求給 ChunkMaster。
Client directly sends request to ChunkMaster.

4.
利用廣播將變化的值(寫動作)告訴其他的節點，達成共識。
Propagated changed values (write) to other nodes, in order to achieve consensus.

5.
如果是讀動作，則不需要達成共識。
If reads, it can handle without consensus.

In term of Client …

1.
客戶端利用 Library 像 ChunkMaster 發出新的 Session 請求。
Client sends request to ChunkMaster a new session request by Chubby library.

2.
每個 session 有一定的時間，若想要延長時間，則須重新更新 Session。
There is a lease of every session. If client needs more time, it must renew session before lease expires.

Questions …

1. Why does it have different version between nodes?
因為 Paxos 中，Accept 跟 Learn 之間需要時間。
There is a gap between Accept Phase and Learn Phase.

2.How to scale Chubby?
a.客戶端可以暫存讀取的資料。 Client can cache the data it read.
b. 利用代理。 Using Proxy.

3.How does client know the data is not valid anymore?
Master 會傳送無效訊息給客戶。
Master will send invalidation to client.

4.How to handle ChunkMaster failed?
利用 Paxos 選出新的 Master。
Using Paxos to choose new Master.

5.How to choose master?
其中一個節點提出成為新 master 的請求，其他節點需要同意。
One of nodes promote itself as master and other nodes need to agree.

-MsHe

淺談 CAP Theorem

CAP …

Consistency

每個讀取都可以讀到最新的值。
Every read received the latest written value.

Availability

只要有客戶發出請求，伺服器一定要回應。但回應值不一定是最新值。
If there are requests from client, servers must reply. But no guarantee that it returns the latest written value.

Partition Tolerance

分布式中，可能有訊息無法同步或者接收不到，導致發生部分區域不一致的現象。
In distributed systems, some message losing or servers unavailable leads inconsistency between nodes.

AP or CP …

1.Why not AC ?
如果系統可以保持同步且一定可以回應客戶，那就不會出現部分區域不一致的問題。既然會發生不一致的問題，表示 AC 無法同時達成。
If system can achieve availability and consistency, it won’t happen partition tolerance. We are positive that partition tolerance happen ed which means we can’t achieve both availability and consistency.

2. What happened when partition occurs?
a. Consistency over availability
保證節點間的一致性，我們必須犧牲節點的可用性，因為需要等待同步的時間。
We make sure consistency between nodes which means the availability will decrease since we wait for syncing.
b. Availability over consistency
當等待過久時，取消一些操作，降低等待時間，可以確保節點的可用。
When the execution time is too long, we cancel some operations to decrease the waiting time and ensure the availability of nodes.

Question …

1.What should we choose?
a. Consistency over availability
例如：銀行交易，不可以發生不一致的情況。
For example: the bank transaction. It cannot happen inconsistency.
b. Availability over consistency
例如：普通網頁頁面。可以容許暫時不一致性，僅要求最終一致性。
For example: web pages. We can tolerance temporary inconsistency and only require eventually consistency.

-MsHe

淺談 Byzantine Failure

Byzantine failures …

隨機的故障發生在分布式系統中，這不是我們之前提過的任何一種故障。上圖中，拜占庭錯誤可以分為兩種。
1.Primary 故障：傳送不同的結果給 backup。
2.Backup 故障：傳送跟 primary 不一樣的訊息給另一個 backup。

Arbitrary failures that occurs in distributed system. We have not told about this type of failure before. In above figure, there are two types of Byzantine failures.
1.Primary failure: sending different message to different backups.
2.Backup failure: sending the messages which is different from primary to other backups.

How to fix …

如果發生拜占庭錯誤，我們需要至少 3K + 1 個機器，才可以達成共識。
If Byzantine failure happened, we need at least 3k+1 machine to tolerance fault.

Prove

在這個兩個例子中，藍色士兵是沒有發生問題的。所以我們以他為例，他收到了兩種不同的指令，但並沒有辦法判斷哪個指令才是正確的在只有三個的形況下(3K)。
In two example, the blue soldier is normal. In the term of blue soldier, he received two commands and can not determine which one is correct in this situation(3K).

在這個例子中，藍色士兵接收到兩個 attack 跟一個 retreat，這種情況(3K+1)，它可以做出攻擊的結論。
In this case, the blue soldier received two attack and one retreat, he can make an attack conclusion in this situation (3K+1).

What the problem of its solution …

1.
在解決方法中，我們假設每個士兵都會在一定的時間內接收到訊息，這個假設在分布式中是有困難的。
In the solution, we assume that every soldier receive the command in finite time. This assumption is difficult to achieve in distributed system.

2.
難以偵錯。我們難以區分是因為延遲還是機器故障，導致沒有收到訊息。
It is difficult to detect the reason of faults that lead to be unavailable since we can not tell it is latency or failure.

下一章，我們會討論到非常著名的 CAP 理論。
Next, we will discuss the famous CAP Theorem.

-MsHe

淺談 Paxos

Why Paxos appeared …

Primary-Backup 的架構中，需要所有的 backup 都回傳 ACK 訊息，才算是同步成功。這樣的方式，會產生兩個問題。
1.並不是每一台機器都正常運作，如果有一台機器連不到，怎麼容忍部分正確。
2.怎麼讓程序正常執行，而不會因為暫時連不到機器而被卡住。

In primary-backup, primary must receive ACK message from all backup to ensure synchronization. This way causes two problems.
1. Not every machine is functional. How to tolerance partition.
2. How to work functionally without blocking users if there is one machine unavailable.

Key idea of Paxos …

像通過法律需要半數的公民同意，如果一半的機器承諾可以執行更新，則達成共識。

Like passing laws need to get majority of citizens agreement, the consensus can be achieved if there are majority of replicas commit to update.

Paxos …

1.
現今在業界被廣泛使用。
Nowadays, it is widely used is industry.

2.
可以達成最終一致。這意味著，只要有同意的值，最終所有機器都會發現這個值。
Can reach the consistency eventually. It means if there is a value which is committed to be updated, all machines will eventually discover the value.

3.
因為只需要一半機器的承諾，是一個高度容忍錯誤的演算法。
Since it only needs half of machines commitment, it is a algorithm that highly tolerances faults.

Roles of Paxos …

Proposers: 提出更新值。Proposes values.
Acceptors: 更新者，如果該值被一半以上的機器接受。 Accept values which are committed to be updated by majority of acceptors.
- 每個 acceptor 必須記住三個值。 Every acceptor maintains three values.
- np: 最高被承諾要更新的提案值。 The highest proposal number which is promised to accept.
- na: 已經被接受的最高提案值。 The highest proposal number which is accepted.
- va: 已經被接受的值。 The value accepted.
Leaners: 學習最後選出的更新值。Learn the chosen values.

每台機器可以擔任任何的角色，也可以擔任不只一種角色，例如：可以同時為 proposer 跟 learner。
Each machine can be play any role, even can play more than one role. For example, one machine can be both proposer and learner.

Process of Paxos

Prepared Phase

1.
提案者提出唯一的提案數 n。並且傳送 prepare 訊息給所有的接受者。(圖a)
Proposer proposes an unique proposal number n and sends prepare message to all acceptors. (Figure a)

2.
接收者接收到 prepare 訊息後，判斷是否收受提案。
After receiving prepare message, acceptor determine whether promise the propose.
If n > np
np = n
if there is no value accepted
sends promise message to proposer <promise,n,null> (Figure b)
if there is a value accepted
sends promise message to proposer <promise,n,(na,va)>
else
sends <prepare-failed> to proposer (Figure c)

Accept Phase

1.
一但提案者收到多數接受者承諾訊息後，傳送 <accept,(n,v)> 訊息給接受者。
Once proposer received promise message from majority of acceptors, it sends <accept,(n,v)> message to all acceptors.
‧ 如果 va 存在，則 v = va, 如果不存在，則 v = 自己的值。
‧ If va existed, then v = va, else v = its own value.

2.
接受者接受到訊息後 After acceptors receiving the message (Figure 2-a, 2-b)
n >= np (Figure 2-a)
na = np = n
va = v

Learner Phase

學習者發現被大多數接受者接受的值。
Learners discover the value which was accepted by majority of acceptors.

Questions …

1.Like Figure 2-b, how to fix race condition?
當第二階段失敗後，不要馬上提出新的提案，先回到其他狀態。
When failed on second phase, don’t propose another proposal immediately and back off for a random period.

2. How do learners learn the value accepted?
利用第一階段的 prepare 訊息學習接受的值。
Learners discover value accepted by response of prepare message.

-MsHe

淺談分布式中的 Consistency

Consistency in distributed system …

在分布式中，我們一直談論一致性的問題。也為了一致性想出了多種的架構及分法。那麼我們所追求的一致性，到底可以分幾種呢。

In distributed system, we usually mention about consistency. For it, we figured out many approaches and architectures. How many types of consistency we are desired?

Linearizability …

每個寫入順序完全一樣。 Total ordering of writes.
讀取時，返回最後完成結果。 When read data, it always returns last completed result.
一致性最高，但是花費的效能也最高。 The strongest consistency, but the cost of efficiency is highest.

Sequential consistency …

所以執行序看到的執行結果相同。 The result shows in any process is the same.
讀取時，不保證回傳最終的完成結果。 There is no guarantee that it returns last completed result when read data.

Example 1
這時候，所有的執行序看到一樣的結果，這是 sequential consistency 的情況。
In this time, all processes see the same result. It is sequential consistency.

Example 2
這次的結果，執行序 3 跟執行序 4 看到不一樣的結果，沒有達成 sequential consistency。
This time, process 3 and process 4 see different result. It is not sequential consistency.

Causal consistency …

結果顯示順序是按照因果排序。The results order by causally.
如果動作 a 比動作 b 執行更早，則看到 b 時，一定會看到 a。If the action a happened before action b, then users would see action a if they see action b.

Example 1
如果你看到一則評論，一定會看到評論的文章。
If you see a comment, you must see the post which the comment is commented.

Example 2
執行序 4 符合 causal consistency，但不符合 sequential consistency。
Process 4 is causal consistency, but it is not sequential consistency.

-MsHe

淺談 Primary-Backup Replication (2)

Previous …

上一次，我們主要在講 Primary-Backup 裡面的架構，這次我們要討論 client 端的觀點。

In the last time, we mainly discuss the architecture of primary-backup. This time we are going to talk about the perspective of client.

淺談 Primary-Backup Replication

Client …

1.
和 primary 溝通。
Communicate with primary.

2.
需要知道 primary 的位置。
Need to know the address of primary.

How does client know primary …

透過 view service，得知 primary 的地址。
Know the address of primary through view service.

View Service …

1.
知道目前 primary-backup 的關係，我們稱為 view。
Save the current membership of primary-backup and it is called view.

2.
當 primary 或者 backup 有變動或這失敗時，view 會改變。
View will change if there is changes or failures of primary or backup.

3.
會定期發送訊息探測 primary 跟 backup 的狀態。
Sends messages periodically to detect failures of primary and backup.

Process of changing view …

1.
View Service 會告訴所有節點 View 的變化。
View service announces view change to all nodes.

2.
上次有提到，Primary 要收到 Backup 回傳的 ACK 訊息，確認 Backup 已經同步。
In the last time, we mentioned that primary will receive ACK message from backup to ensure the syncing.

3.
確認 Backup 同步後，Primary 必須發送 ACK 訊息，代表承認新的 View。
After ensuring syncing with backup, primary must send ACK message to acknowledge the new view.

Questions …

1. What will happen if primary failed when view changes?
如果 View 變更時，primary 失敗，則整個程序會卡住。
The process of changing view will be stucked if primary failed at the same time.

2.How to scale view service?
客戶端可以暫存 View。
Client can cache view.

3. How does the client know the cached view is valid or not?
如果客戶端訪問 primary 沒有回應或者得到錯誤訊息時，客戶端會重新去訪問 View Service 並且將暫存的 View 視為無效.
If there is no response or error message from primary, client will send new request to view service and invalidate the cached view.

-MsHe

淺談 Primary-Backup Replication

Why need replication …

1.
可以提高效率，因為使用者可以存取最近的副本。
Improved performance since users can access nearest replicas.

2.
有利於擴大系統，因為單點效率提高。
Aided scaling since the performance improved at each node.

3.
系統可以處理節點的錯誤，因為有可替代的副本。
System can handle failures since there is replaceable replicas.

Why replication is hard …

1.
節點多，如果出錯的話，難以找到錯誤。
It is hard to find failure if there some things go down since it is too many nodes.

2.
最大的挑戰是：如何保證每個副本的一致性。
The biggest challenge is to make sure the consistency across replicas.

Roles …

Primary

1.
Primary 可以有多個 backup 溝通。
Primary can have several backups.

2.
直接跟客戶溝通的管道。
The entry that can communication with clients directly.

Backup

1.
每個 backup 只認定一個 primary。
Each backup has only one primary.

2.
必須向 primary 註冊自己。
Have to register itself to primary.

Process …

1.
客戶像 primary 發送訊息。
Client sends message to primar

2.
Primary 接收到後，跟 backup 同步。
After received the message, primary sync with backup.

3.
Primary 收到 backup ACK 的訊息，確認 backup 已經同步完成。
Primary received ACK from backup to ensure backup completion.

3.
同步完後，primary 回應 client。
After syncing, primary responds to client.

Questions …

1. If primary failed, how to handle it?
將其中一個 backup 變成 primary。
Promote one of backups as the primary.

2.If backup failed, how to handle it?
只要 primary 沒有失敗，就可以重啟一個 backup，不會影響到系統運作。
As soon as primary is successful, simply adds a new backup. The backup failure won’t affect system.

3. What’s the challenge of primary-backup replication?
如何找到適當的 primary-backup 比例。
How to find the right ratio between primary and backup.

-MsHe

淺談 Checkpoint / Snapshot

Fault Tolerance …

在分布式系統中，很重要的元素。當我們有多個機器時，如果任何一個環節出錯，我們該怎麼從錯誤中回復呢。這次介紹了兩種個方式：Checkpoint, snapshot。

In distributed systems, fault tolerance is an important element. When there are several machines, how we recover state if something goes down. We will introduce two ways: checkpoint and snapshot.

Types of failures …

1.
Crash failures
機器停止回應，這種情況可以從上一個儲存狀態中回復。
Machine stop responding, it can recover from last saved state.

2.
Fail stop
跟 Crash failure 的反應一樣，但他失去所有的狀態並且無法從上一個狀態中回復。
Although exhibiting like crash failure, it loses all state and cannot resume with last saved state.

Checkpoint …

1.
定期存取現在的狀態。
Checkpoint process state periodically.

2.
重新啟動機器後，依然可以從上一個存取狀態還原。(存取不會因為重新啟動而消失。)
After restarting, it still can recover from last saved checkpoint. (Which means checkpoint won’t fade because of restarting.)

Challenge of checkpoint …

每次傳遞 checkpoint 時，執行序之間會互相確認，這表示執行序之間是互相依賴的。如果其中一個執行序失敗，因為依賴關係，會導致 checkpoint 失敗，而回到初始狀態，這種情況稱為：骨牌效應。

Since process will check each other after sending message, it means the process is dependent with each other. If there is one process failed, the checkpoint will fail and the process will go back to the initial state because of dependency. This situation is called Domino effect.

Chandy-Lamport algorithm …

一種用於分布式系統的演算法。該演算法針對如何抓取全局一致的狀態。
An algorithm is used in distributed system. The algorithm is for capturing the consistent global state for synchronization.

Global snapshot …

抓取 “發生前" 的狀態。
Capture state that happens before.
For example:
If a -> b, we take snapshot in b state, then a must be in snapshot. (like the figure)

Process of global snapshot …

The sender …

1.
其中一個執行序(發送者)記錄自己的本地狀態。
One of processes (sender) records its local state.

2.
廣播 marker 訊息給其他執行序。 Marker 訊息是告訴其他執行序有 snapshot 的需求，並且記錄marker訊息前的所有訊息。
Sender broadcasts marker message to others. The goals of marker message are telling other process the need for snapshot and recording all messages before receiving marker message.

3.
開始記錄所有回傳的訊息。
Start to record received messages from incoming channel.

The receiver …

1.
First case: if the state has not been recorded by received process when received the marker message
a. 紀錄自己的本地狀態 Record local state
b. 將接收執行序跟發送執行序之間的通道狀態設定為 empty
(Record state of channel from sender to receiver as empty)
c. 發送 marker 訊息給其他執行序 Send marker message to other processes.

2.
Second case: the state of receiver has been recorded
a. 停止紀錄 stop recording
b. 將接收執行序跟發送執行序之間的通道狀態設定為 message received
( Record state of channel from sender to receiver as message received)

-MsHe

淺談 Vector Clock

Previous …

上一次，我們講到邏輯時間，可以幫助更新順序。這次，我們談到另一個時間。

Last time, we discuss about logical clock which updates in the same order at all replicas. This time, we will discuss about another clock.

淺談 Synchronozation

Causal ordering …

在分布式系統中，這是一個重要的觀念。我們通常以事件發生順序來做總排序。意思是，現在有多台機器，A -> B 代表 A 發生在 B 之前。以先前的寫法即 C(A) < C(B)。 B -> C，則 B 發生在 C 之前。如此可以推斷 A -> C。

This is an important rule concept in distributed system. We usually order events by when event happened. Now, there are several machines and A -> B represents A happened before B. In previous format, we can also write C(A) < C(B). B -> C which means B happened before C. In this case, we can infer that A -> C.

Vector clock …

1.
向量時間用來確認事件是否是因果排序。
Vector Clocks determine whether events are causally related.

2.
每一個執行序有自己的向量值。
It has its own value of each process.
Example: there are three processes. The vector clock represent [c0,c1,c2] = [0,0,0]

Executing rules …

1.
若事件發生在本地執行序，則更新自己的向量值。
For each local event, increment local value on each process.

2.
如果執行序 i 收到訊息 D 向量時間，則每個向量值 = max(本地向量值,收到D的向量值)。最後將自己的向量值加一。
If process i received message with vector clock D, set local value ci = max(ci,di). Then increment local value.
Example: If process1 has its own vector clock [3,0,0], then it received vector clock D = [2,4,1].
C0 = max(3,0) = 3; C1 = max(0,4) = 4; C2 = max(0,1) = 1. So it got [3,4,1]. Finally, it increment local value. So the result is [4,4,1].

-MsHe