Monday 13 November 2017

Two failed modifications of Paxos: towards a better understanding of the protocol

Two failed attempts at modifying Paxos.

The Paxos algorithm solves the following problem:
There are n processes.  Each process starts with a random value of 0 or 1.
Processed communicate via messages.  Messages might be lost or arbitrarily delayed without bound.
At some time in the future, all processes need to decide on one of the two value.  There must be a way to find out when the decision has been made and what value it is.  Furthermore, to prevent trivial solutions, the decision outcome must be a subset of the starting values.  (I.e., if all processes start with 0, they cannot ever decide on 1).  Once a decision is reached, it must never change.

The following algorithm solves it assuming that a majority of servers can communicate with each other.  If they cannot, it just means that a decision will not be reached (liveness), there won't be a case where a decision is reached but later changed (safety).

Each machine keeps a local counter counter and also is assigned a unique server_id offline before everything starts.
Each server tries to make proposals:
1. Send a "prepare-to-propose" message to all machines (or a majority) containing its ballot number which is the pair (counter, server_id).  Increase its counter.
2. When it receives reply from a majority, do the following:
Each reply should contain a (max_ballot_accepted, and accepted_value) but might be null.  If everyone in the majority is null, then send propose message containing the ballot (counter, server_id) with any value.  However, if not everyone in the majority replied null, then sort by max_ballot_accepted, and propose that particular accepted_value.

Each server should also listen for prepare-to-propose message and proposals.  It should also keep track of (max_ballot_accepted), max_ballot_promised, and accepted_value.
1. Upon receiving prepare-to-propose, if the ballot is smaller than max_ballot_promised, ignore, if it is greater then, update max_ballot_promised and reply with (max_ballot_accepted, and accepted_value).
2. Upon receiving a proposal, check to see if it is greater or equal to max_ballot_promised, if so, accept the proposal and update our state.

When is consensus reached.  Consensus is reached when a majority of acceptors accepted the same ballot.  We can find out when that happens if all the processes for example, send out a status update whenever a proposal is accepted.  Of course, the status update might be lost and there is no way of knowing whether the message is lost.  But we are guaranteed that the consensus value will never change once it is changed (safety).

Some intuition on why it works:
1. The prepare-to-propose serves as a barrier to ensure that any accepted ballots will be "known" by later ballots.
2. By only proposing the largest ballot from a majority, we ensure that newer proposed values will remain the consensus value if consensus have been reached.

Once some majority M has accepted a ballot, then any proposer proposing higher ballot numbers must not have sent the "prepare-to-propose" message to this majority until after the acceptance.  Otherwise the prepare-to-propose message servers as an obstacle to getting this particular ballot accepted in at least one common member.
Lower ballot numbers also suffer the same issue.
Now, after receiving reply to prepare-to-propose messsages from a majority M', M' has at least one member in common with M.  We can do a strong induction on the ballot number.  If the current ballot is n, we can assume all proposed ballots between the original accepted ballot and n-1 (inclusive) all propose the same consensus value.  The common member between M' and M would have replied a ballot number at least as great as the original accepted ballot number.  The proposer must propose the value corresponding to the largest ballot number, hence it must also propose the consensus value.

To help us understand better, let's ask whether we can modify the protocol in two ways.
1. Instead of sending out a "prepare-to-propose" message which extracts a promise not to accept lower numbers, can we simply query for proposed values instead of asking for a promise.  We will block out lower ballots at the acceptance step.
The issue is that ballots with the same number might override the consensus value.
We could reach a state each row represents one process, their accepted ballot and value)
1, v1
1, v1
1, v1
2, v2
null
Consider the following sequence of messages (p1 stands for process one).
p1.PrepareToPropose(ballot 1).SendTo(1, 2, 3)
p2.PrepareToPropose(ballot 2).SendTo(1, 2, 3)
p1.Received.PrepareResponse(null, null, null).Propose(1, v1).SendTo(1, 2, 3)
(p1, p2, p3).Accept(ballot 1, v1)
p2.Received.PrepareResponse(null, null, null).Propose(2, v2).SendTo(4) // messages to other members of the majority are lost
p4.Accept(ballot 2, v2)

This is happening because p2 does not know that ballot 1 is already in progress.

2. Is it possible to declare that consensus is reached as soon as a majority has picked the same value?  Right now, we require majority to pick the same ballot.
No.  After majority picked the same value, the consensus value might not remain the same.
We again describe accepted ballot, accepted number, put also show the internal "promise" counter.
To simplify we only show global state transitions.  The user can fill in the messages and lost messages.

1: v1 (1)
1: v1 (1)
2: v2 (2)
2: v2 (2)
null (2) (proposal is lost, but prepare was received)

1: v1 (3) // proposal three only reaches the fifth process
1: v1 (3)
2: v2 (2)
2: v2 (2)
3: v1 (3) // here, we are wrong to think that consensus has been reached

Proposer now prepares and gets response from p1, p2, p3.  It will now propose v2.

4: v2 (4)
4: v2 (4)
4: v2 (4)
2: v2 (2)
3: v1 (3) // consensus value changed

No comments: