HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Why is Two-Phase Commit (2PC) blocking?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
commitwhyphaseblockingtwo2pc

Problem

Can anyone let me know, why 2PC is blocking when the coordinator fails? Is it because the cohorts don't employ timeout concept in 2PC?

Good reference: Analysis and Verification of Two-Phase Commit & Three-Phase Commit Protocols, by Muhammad Atif,

Elaboration of the question:

Let’s say, a coordinator sends 1st message (PREPARE-COMMIT). At this point assuming all cohorts are reachable, there are two possible outcomes- either all cohorts reply YES or at least one cohort replies NO. In either way, let’s assume the coordinator crashes before it can send the 2nd message (COMMIT/ABORT).
Since 2PC uses only 2 messages, in order to keep the cohorts consistent, the cohorts can never apply a commit without explicitly receiving a COMMIT message. So, when a coordinator crashes, the cohorts can either (1) apply a ABORT using a timeout or (2) can remain in waiting state indefinitely. It is hard to believe that a 2PC would employ option (2) i.e. waiting indefinitely. Therefore, if option (1) is chosen, then why do we tag 2PC as blocking when a coordinator dies?

Solution

Is it because the cohorts don't employ timeout concept in 2PC?

Yes, in one case they can not use a timeout. It is described in the paper too (II.B.1):


The Two-Phase Commit Protocol goes to a
blocking state by the failure of the coordinator when the
participants are in uncertain state. The participants keep
locks on resources until they receive the next message
from the coordinator after its recovery

In practice that 'uncertain state' means voting 'yes'. After a cohort votes 'yes', there is no going back, it waits for the global decision. (Other cases are fine: by voting 'no', the given cohort immediately knows that the global decision is also a 'no', and if the coordinator dies even before initiating the voting round, cohorts may decide voting 'no' based on a timeout)

Context

StackExchange Computer Science Q#76192, answer score: 7

Revisions (0)

No revisions yet.