patternMinor

Where does the FLP impossibility proof depend on allowing a single process failure?

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

distributed-systems consensus cs stackoverflow

allowingtheprocesswhereimpossibilitysingledoesflpdependfailure

Problem

Question

I'm reading the FLP impossibility paper.
I think I understand the idea of the proof, and I don't have questions about it.

However, it seems like the assumption of having at most, a single faulty process is not used in the proof.
Put another way, if we remove this assumption and forbid process failure, the proof still seems to hold.

This can't be correct, because then it wouldn't be possible to construct a protocol that succeeds with no faulty processes.
However, it is possible, and a protocol meeting this criterion is supplied in section 4.

My question is therefore: which part of the proof relies on the condition that one process can fail?
My best effort at an answer
Definitions

For reference, the assumption in question appears in the definition below:

A run is admissible provided that at most one process is
faulty and that all messages sent to nonfaulty processes are eventually received.

Also relevant is the definition of a deciding run:

A run is a deciding run provided that some process reaches a decision state in
that run.

I'll also use my own definition, below:

A run is 0-admissible provided that no processes are faulty and that all messages sent to nonfaulty processes are eventually received.

Approach

I want to show that, by changing the definition of admissible to that of 0-admissible, the proof is no longer correct. Therefore, it would rely on the at most one failure part of the admissible assumption.
Lemma 2

Lemma 2 ("P has a bivalent initial configuration") references this assumption. In particular:

$C_0$ and $C_1$ are $0$-valent and $1$-valent initial configurations, respectively. They differ only in the initial value $x_p$, of a single process $p$.

Now consider some admissible deciding run from $C_O$ in which process $p$ takes no steps, and let $\sigma$ be the associated schedule.

Then $\sigma$ can be applied to $C_1$, also, and corresponding configurations in the two runs are identical except for the internal state of process $p$

Solution

The part of the original paper's proof that requires node failure to prove the impossibility is case 2 of lemma 3. This case assumes that there is a "finite deciding run from C0 in which p takes no steps". p taking no step means it has died/failed.

From case 1 of lemma 3, we know that there exists a pair of messages, called e and e' which affect the decision process, and if they are received by different nodes (p and p') the nodes will decide on different values. This means that these "important messages" must be received by a single node p only.

Case 2 tests this further with the possibility that node p, the recipient of the "important messages", fails (or "takes no steps"). If a "totally correct in spite of one fault" algorithm exists, it needs to tolerate the scenario where node p fails after configuration C0. Of course, p might just not fail, and the algorithm must be correct in both scenarios: p failing and p not failing.

Since the algorithm must decide on some value, if p is failing it has to proceed through some run called sigma and arrive at a deciding configuration A.

The problem here is that if p is actually still functional, it will accept e and e' and decide on its own. As we know from case 1, node p can decide on 0 (e arriving before e') or decide on 1 (e' arriving before e). So node p is making a decision on its own with no regard for what the rest of the nodes (through sigma) are deciding and ultimately there will be a case where the 2 set of nodes decide on different values.

As for your proof, since p takes no steps (it dies after C0), it invalidates your definition of a 0-admissible run.

Context

StackExchange Computer Science Q#68781, answer score: 4

Revisions (0)

No revisions yet.