HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Why is the commit phase in PBFT necessary?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
commitwhythephasenecessarypbft

Problem

I've read many papers and slides on Practical Byzantine Fault Tolerance (PBFT) but I'm still confused about why a COMMIT phase is required. Most material states that

  • PREPARE phase ensures fault-tolerant consistent ordering of requests within views



  • COMMIT phase ensures fault-tolerant consistent ordering of requests across views



Some tutorial glosses over that


The PREPARE phase ensures that a majority of correct replicas has agreed on a sequence number for a client’s request. Yet that order could be modified by a new leader elected in a view change.

Can someone show me an example of how COMMIT interacts with view changes?

Solution

PBFT is a master piece, for its technical breakthrough and exquisitely precise language. Many descriptions on the protocol details worth reading multiple times to grasp all the nuances.

I will:

  • quote the original paper (some math notation expressed in Latex, I will use pseudo code instead)



  • add on my understanding/interpretation.



  • Q&A myself important questions



  • Answer your question directly by giving out an example



Pre-Prepare and Prepare phases

We define the predicate prepared(m,v,n,i) ....

The pre-prepare and prepare phases of the algorithm
guarantee that non-faulty replicas agree on a total order
for the requests within a view. More precisely, they
ensure the following invariant: if prepared(m,v,n,i) is
true then prepared(m',v,n,j) is false for any non-faulty
replica j (including i = j) and any m' such that D(m') !=D(m)
.

This means, once reached "prepared" stage (i.e. one replica has received more than 2f+1 PREPARE), this replica, if non-faulty, could be certain that, this message m in this view v is associated with this sequence number n.

Commit phase

The commit phase ensures the following invariant: if
committed-local(m,v,n,i)
is true for some non-faulty i
then committed(m,v,n)
is true. This invariant and
the view-change protocol described in Section 4.4 ensure
that non-faulty replicas agree on the sequence numbers
of requests that commit locally even if they commit in
different views at each replica.

^^^ This is the clue for me to go back and carefully read the definition of "commit-local" over and over:

and committed-local(m,v,n,i)
is true if and only if prepared(m,v,n,i)
is true and has
accepted 2f+1 commits (possibly including its own)
from different replicas that match the pre-prepare for m;
a commit matches a pre-prepare if they have the same
view, sequence number, and digest.

^^^ notice that? match the pre-prepare, not match the prepare, why? remember "PREPARE" gives certainty on ordering within a view, "PRE-PREPARE for m" emphasize on certainty of sequence number regardless of which view, as long as commit matches the pre-prepare.

Maybe you know why by not, but if it is still not clear, please read on.

Q:Why COMMIT phase necessary?

  • There are replicas (non-faulty or otherwise) that didn't receive enough (i.e. 2f+1) PREPARE messages, either due to lossy network or being offline for a while. For them, they can't reach PREPARED stage. But! But when they heard from 2f+1 replicas broadcasting COMMIT message, they could be certain to commit on (m,v,n,i)



  • Apparently, a obvious benefit for COMMIT stage is that it accelerate the agreement/consensus process. Intuitively, you could understand it as : I'm on my way to school, Bob told me school is closed today. I couldn't take one man's word as truth. But as I march on, I see many more classmates (who have already checked whether school is closed or not) returning back from school telling me school is closed. Up until majority told me so, I will take their word for it before reaching school myself.



Example of cross view COMMIT

There are 4 replicas in total. For a message m, I, the replica, received a COMMIT: (COMMIT, m, v, n, i=2), meaning in view v, node #2 told me he committed. But since I didn't receive any other commits, I couldn't reach "commit-local" stage.

Now, "NEW-VIEW" message has been passed around. During protocol redo: other replicas multicast PREPARE messages for each message between min-s and max-s, then later I receive another COMMIT (COMMIT, m, v+1, n, i=3)

Now, could I reach "commit-local"? since I have one commit in view v and another in view v+1.

Answer is yes. Because new primary issued a new PREPARE for m, and I had two matching commits -- 2 commits in a 4 replicas system. Good enough.

Hope it helps!
Any further comment is welcomed.
Cheers!

Context

StackExchange Computer Science Q#54152, answer score: 6

Revisions (0)

No revisions yet.