patternModerate
What are the conditions necessary for a programming language to have no undefined behavior?
Viewed 0 times
necessarythewhatareprogrammingbehaviorundefinedlanguageforconditions
Problem
For context, yesterday I posted Does the first incompleteness theorem imply that any Turing complete programming language must have undefined behavior?. Part of what prompted me to ask that question in the first place is that, awhile ago, someone on the learnprogramming subreddit told me something about the reason C++ in particular having so much undefined behavior is because, for it to not have undefined behavior, it would have to use a much more restrictive language model, but they didn't explain what that means exactly. I had also asked on Quora awhile ago about why C++ compilers don't always throw errors when a program contains undefined behavior and at least one answer mentioned something about it being fundamentally impossible to always detect undefined behavior at compile time and that this was related to the halting problem being undecidable.
Those two things combined have me wondering about models of computing more generally -- my understanding is that all/most popular programming languages, including C++, are Turing complete, and since I was told the problem of detecting UB in C++ is fundamental and related to the halting problem, I thought that perhaps all Turing complete programming languages must have undefined behavior and C++ is just worse at hiding it than others. But judging from the answers to my above-linked question, I was mistaken about that.
So my question now is, what conditions need to be imposed on a Turing complete language in order to guarantee that all possible programs written in the language will have fully defined behavior determined by the language specification? And, on a side note, does the answer have anything to do with the incompleteness theorems? I ask the latter question because the idea of defining a language for which all possible programs have fully defined behavior seems quite similar to the idea of defining an axiom system for which all possible theorems are provable/disprovable.
Those two things combined have me wondering about models of computing more generally -- my understanding is that all/most popular programming languages, including C++, are Turing complete, and since I was told the problem of detecting UB in C++ is fundamental and related to the halting problem, I thought that perhaps all Turing complete programming languages must have undefined behavior and C++ is just worse at hiding it than others. But judging from the answers to my above-linked question, I was mistaken about that.
So my question now is, what conditions need to be imposed on a Turing complete language in order to guarantee that all possible programs written in the language will have fully defined behavior determined by the language specification? And, on a side note, does the answer have anything to do with the incompleteness theorems? I ask the latter question because the idea of defining a language for which all possible programs have fully defined behavior seems quite similar to the idea of defining an axiom system for which all possible theorems are provable/disprovable.
Solution
First off, let's be clear on what "undefined behaviour" is. In just C alone (and this is the understanding inherited by C++), there are two possible meanings, depending on which version of the standard you choose.
The C89 standard, section 3.16, defines:
undefined behavior: Behavior, upon use of a nonportable or erroneous
program construct, or of erroneous data, or of indeterminately-valued
objects, for which this International Standard imposes no requirements.
Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
The 1999 C Rationale explains:
Undefined behavior gives the implementor license not to catch certain
program errors that are difficult to diagnose. It also identifies
areas of possible conforming language extension: the implementor may
augment the language by providing a definition of the officially
undefined behavior.
The C99 standard, section 3.4.3, words it slightly differently, turning the second sentence into an explanatory note:
undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
EXAMPLE An example of undefined behavior is the behavior on integer overflow.
Note that in the second sentence, the word "permissible" was changed to the word "possible".
The way everyone interpreted the C89 standard was that the second sentence was normative: it described the set of permissible behaviours. So "undefined behaviour" is only "undefined" in the sense that the standard does not require which of the permissible behaviours an implementation may do.
But with C99, following ISO rules, moving the second sentence to an explanatory note and changing the word "permissible" to "possible" means that it is not normative. These are merely possible behaviours, but because the standard imposes no requirements, any behaviour is possible. This is sometimes jokingly known as a nasal demon, because making demons fly out of your nose is a possible behaviour.
This is more than a little controversial, and the reinterpretation has caused customary C programming idioms to become bugs, including one in SPECint, the standard suite of integer benchmarks. Chris Lattner of LLVM put it this way: "huge bodies of C code are land mines just waiting to explode."
But I digress.
C is designed as a systems programming language, and as such, allowing direct manipulation of the underlying platform (e.g. CPU, OS) is a feature. But C is also meant to be portable, which means that differences in the underlying platform manifest as differences in behaviour of the program. And sometimes, this behaviour is a global property of the program, not detectable in any specific line of code.
There are essentially only three ways out of this, for any given situation that would otherwise be undefined behaviour.
The C89 standard, section 3.16, defines:
undefined behavior: Behavior, upon use of a nonportable or erroneous
program construct, or of erroneous data, or of indeterminately-valued
objects, for which this International Standard imposes no requirements.
Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
The 1999 C Rationale explains:
Undefined behavior gives the implementor license not to catch certain
program errors that are difficult to diagnose. It also identifies
areas of possible conforming language extension: the implementor may
augment the language by providing a definition of the officially
undefined behavior.
The C99 standard, section 3.4.3, words it slightly differently, turning the second sentence into an explanatory note:
undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
EXAMPLE An example of undefined behavior is the behavior on integer overflow.
Note that in the second sentence, the word "permissible" was changed to the word "possible".
The way everyone interpreted the C89 standard was that the second sentence was normative: it described the set of permissible behaviours. So "undefined behaviour" is only "undefined" in the sense that the standard does not require which of the permissible behaviours an implementation may do.
But with C99, following ISO rules, moving the second sentence to an explanatory note and changing the word "permissible" to "possible" means that it is not normative. These are merely possible behaviours, but because the standard imposes no requirements, any behaviour is possible. This is sometimes jokingly known as a nasal demon, because making demons fly out of your nose is a possible behaviour.
This is more than a little controversial, and the reinterpretation has caused customary C programming idioms to become bugs, including one in SPECint, the standard suite of integer benchmarks. Chris Lattner of LLVM put it this way: "huge bodies of C code are land mines just waiting to explode."
But I digress.
C is designed as a systems programming language, and as such, allowing direct manipulation of the underlying platform (e.g. CPU, OS) is a feature. But C is also meant to be portable, which means that differences in the underlying platform manifest as differences in behaviour of the program. And sometimes, this behaviour is a global property of the program, not detectable in any specific line of code.
There are essentially only three ways out of this, for any given situation that would otherwise be undefined behaviour.
- Define the behaviour. For example, you could mandate two's complement arithmetic so that integer overflow has a defined behaviour. In some situations this may require extra code; see point 3. For example, shifting a 64-bit word left by 64 bits might seem to have a reasonable "answer" (i.e. zero), but the shift left instruction on many CPUs doesn't do this, so this would require extra code to detect it.
- Provide enough static guarantees such that UB never occurs. This could be anything from not having language support for the kind of activity that might cause UB (e.g. unsafe references or pointer arithmetic, which many languages simply don't have) to having a strong static type system which the programmer cannot subvert (e.g. to disallow modifying static strings).
- Generate code to detect the situation at run-time and take action then (e.g. raise an exception, terminate the program, substitute defined behaviour).
Context
StackExchange Computer Science Q#161663, answer score: 12
Revisions (0)
No revisions yet.