debugCritical
Proof that dead code cannot be detected by compilers
Viewed 0 times
cannotcompilersdeaddetectedthatcodeproof
Problem
I'm planning to teach a winter course on a varying number of topics, one of which is going to be compilers. Now, I came across this problem while thinking of assignments to give throughout the quarter, but it has me stumped so I might use it as an example instead.
In the program above, it's obvious that the print statement will never execute because of the
public class DeadCode {
public static void main(String[] args) {
return;
System.out.println("This line won't print.");
}
}In the program above, it's obvious that the print statement will never execute because of the
return. Compilers sometimes give warnings or errors about dead code. For example, the above code will not compile in Java. The javac compiler, however, will not detect all instances of dead code in every program. How would I prove that no compiler can do so?Solution
It all comes from undecidability of the halting problem. Suppose we have a "perfect" dead code function, some Turing Machine M, and some input string x, and a procedure that looks something like this:
If M runs forever, then we delete the print statement, since we will never reach it. If M doesn't run forever, then we need to keep the print statement. Thus, if we have a dead-code remover, it also lets us solve the Halting Problem, so we know there can be no such dead-code remover.
The way we get around this is by "conservative approximation." So, in my Turing Machine example above, we can assume that running M on x might finish, so we play it safe and don't remove the print statement. In your example, we know that no matter which functions do or don't halt, that there's no way we will reach that print statement.
Usually, this is done by constructing a "control-flow graph". We make simplifying assumptions, such as "the end of a while loop is connected to the beginning and the statement after", even if it runs forever or runs only once and doesn't visit both. Similarly, we assume that an if-statement can reach all of its branches, even if in reality some are never used. These kinds of simplifications allow us to remove "obviously dead code" like the example you give, while remaining decidable.
To clarify a few confusions from the comments:
-
Nitpick: for fixed M, this is always decidable. M has to be the input
As Raphael says, in my example, we consider the Turing Machine as an input. The idea is that, if we had a perfect DCE algorithm, we would be able to construct the code snippet I give for any Turing Machine, and having a DCE would solve the halting problem.
-
not convinced. return as a blunt statement in a no-branch straight forward execution is not hard to decide. (and my compiler tells me it is capable of figuring this out)
For the issue njzk2 raises: you are absolutely right, in this case you can determine that there is no way a statement after the return can be reached. This is because it's simple enough that we can describe its unreachability using control-flow graph constraints (i.e. there are no outgoing edges out of a return statement). But there is no perfect dead code eliminator, which eliminates all unused code.
-
I don't take input-dependent proof for a proof. If there exists such kind of user input that can allow the code to be finite, it's correct for the compiler to assume that following branch is not dead. I can't see what are all these upvotes for, it's both obvious (eg. endless stdin) and wrong.
For TomášZato: it's not really an input dependent proof. Rather, interpret it as a "forall". It works as follows: assume we have a perfect DCE algorithm. If you give me an arbitrary Turing Machine M and input x, I can use my DCE algorithm to determine whether M halts, by constructing the code snippet above and seeing if the print-statement is removed. This technique, of leaving a parameter arbitrary to prove a forall-statement, is common in math and logic.
I don't fully understand TomášZato's point about code being finite. Surely the code is finite, but a perfect DCE algorithm must apply to all code, which is an infinte set. Likewise, while the code-itself is finite, the potential sets of input are infinte, as is the potential running-time of the code.
As for considering the final branch not-dead: it is safe in terms of the "conservative approximation" I talk about, but it's not enough to detect all instances of dead code as the OP asks for.
Consider code like this:
Clearly we can remove
Note that I am not coming up with this on my own. It is a well known result in the theory of compilers. It's discussed in The Tiger Book. (You might be able to see where they talk about in in Google books.
Run M on input x;
print "Finished running input";If M runs forever, then we delete the print statement, since we will never reach it. If M doesn't run forever, then we need to keep the print statement. Thus, if we have a dead-code remover, it also lets us solve the Halting Problem, so we know there can be no such dead-code remover.
The way we get around this is by "conservative approximation." So, in my Turing Machine example above, we can assume that running M on x might finish, so we play it safe and don't remove the print statement. In your example, we know that no matter which functions do or don't halt, that there's no way we will reach that print statement.
Usually, this is done by constructing a "control-flow graph". We make simplifying assumptions, such as "the end of a while loop is connected to the beginning and the statement after", even if it runs forever or runs only once and doesn't visit both. Similarly, we assume that an if-statement can reach all of its branches, even if in reality some are never used. These kinds of simplifications allow us to remove "obviously dead code" like the example you give, while remaining decidable.
To clarify a few confusions from the comments:
-
Nitpick: for fixed M, this is always decidable. M has to be the input
As Raphael says, in my example, we consider the Turing Machine as an input. The idea is that, if we had a perfect DCE algorithm, we would be able to construct the code snippet I give for any Turing Machine, and having a DCE would solve the halting problem.
-
not convinced. return as a blunt statement in a no-branch straight forward execution is not hard to decide. (and my compiler tells me it is capable of figuring this out)
For the issue njzk2 raises: you are absolutely right, in this case you can determine that there is no way a statement after the return can be reached. This is because it's simple enough that we can describe its unreachability using control-flow graph constraints (i.e. there are no outgoing edges out of a return statement). But there is no perfect dead code eliminator, which eliminates all unused code.
-
I don't take input-dependent proof for a proof. If there exists such kind of user input that can allow the code to be finite, it's correct for the compiler to assume that following branch is not dead. I can't see what are all these upvotes for, it's both obvious (eg. endless stdin) and wrong.
For TomášZato: it's not really an input dependent proof. Rather, interpret it as a "forall". It works as follows: assume we have a perfect DCE algorithm. If you give me an arbitrary Turing Machine M and input x, I can use my DCE algorithm to determine whether M halts, by constructing the code snippet above and seeing if the print-statement is removed. This technique, of leaving a parameter arbitrary to prove a forall-statement, is common in math and logic.
I don't fully understand TomášZato's point about code being finite. Surely the code is finite, but a perfect DCE algorithm must apply to all code, which is an infinte set. Likewise, while the code-itself is finite, the potential sets of input are infinte, as is the potential running-time of the code.
As for considering the final branch not-dead: it is safe in terms of the "conservative approximation" I talk about, but it's not enough to detect all instances of dead code as the OP asks for.
Consider code like this:
while (true)
print "Hello"
print "goodbye"Clearly we can remove
print "goodbye" without changing the behavior of the program. Thus, it is dead code. But if there's a different function call instead of (true) in the while condition, then we don't know if we can remove it or not, leading to the undecidability.Note that I am not coming up with this on my own. It is a well known result in the theory of compilers. It's discussed in The Tiger Book. (You might be able to see where they talk about in in Google books.
Code Snippets
Run M on input x;
print "Finished running input";while (true)
print "Hello"
print "goodbye"Context
StackExchange Computer Science Q#49332, answer score: 57
Revisions (0)
No revisions yet.