patternMinor

Why do BNF specifications of C-like languages define expressions in terms of seemingly unrelated parent expressions?

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

programming-languages formal-grammars cs stackoverflow

whybnfunrelatedlanguagesparentlikeexpressionsdefinespecificationsseemingly

Problem

Backus-Naur Form specifications for the grammars of languages like like C or C++ build up expressions with counter-intuitive definitions. For instance, a multiplication expression like

5 * 3

is also a logical-or-expression and an equality-expression and a bunch of other things it doesn't actually look like, because the grammar makes it an:

expression

consisting of an assignment-expression

consisting of a conditional-expression

consisting of a logical-or-expression

consisting of a logical-and-expression

consisting of an inclusive-or-expression

consisting of an exclusive-or-expression

consisting of an and-expression

consisting of an equality-expression

consisting of a relational-expression

consisting of a shift-expression

consisting of an additive-expression

consisting a multiplicative-expression.

A snippet from the grammar looks like:

So if I were to write a parser that just followed these productions, I'd end up having to interpret the expression 5 * 3 12 different ways, e.g. by making it an instance of a MultiplicativeExpression class which derives from AdditiveExpression... all the way up to a base Expression class. And that seems very wasteful, since those classes would implement adding, AND-ing, OR-ing, etc. but would simply no-op for the single-term case.

By comparison, the Wikipedia example of BNF makes more sense:

 ::=   
 ::=     
              |  
 ::=  "." | 
...

The Wikipedia example reads like "a postal address consists of a [...]", whereas C-like language grammars read more like "a [...] can be treated as an expression". Why are C-like language grammars so "polymorphic"?

Solution

Why are C-like language grammars so "polymorphic"?

Because that's the way algebra works :-).

The arguments to (for example) a relational expression could be shifts or sums or products (or, for that matter, variables or constants, to say nothing of a variety of unary operator expressions), and all combinations are possible. So you could write out all nine possibilities (and then sixteen possibilities for equality expressions, etc.) (not counting the unary/base possibilities) but the quadratic explosion gets annoying. It's easier to consider each precedence level to include all the more tightly-binding levels as well.

I wouldn't call this polymorphism, really, although you might implement the expression type from a base Expression and individual derived types for each operator. The semantic value of every *-expression production is then a derived type of Expression, and it is not semantically necessary to know which one; the production labels are no longer needed.

Context

StackExchange Computer Science Q#68827, answer score: 2

Revisions (0)

No revisions yet.