patternMinor

Indentation based Grammars

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

grammarsbasedindentation

Problem

Considering programming languages with significant whitespace for indentation, such as Python or Haskell. How does this whitespace fit into the grand schemes of programming language grammars.

I can see that a "preprocessor" can convert indent changes into tokens which are then handled in grammars, yet can one also perform the parsing based upon a single grammar without changing the intrinsic complexity of the 2nd grammer?

Solution

Any modern compiler consists of multiple phases, at least

lexing,

(context-free) parsing and

target code generation.

Additional phases are static analysis (names, types, ...) and optimisations.
Each phase uses the result of the prior as input.

Now, the lexer processes the actual program text and transforms it into a token stream. For instance, the (sub)string "abc" becomes a token StringLiteral("abc"), if becomes IF and myMethod becomes Identifier("myMethod"). Tokens typically hold additional information, e.g. their position in the original program text. The context-free grammar used for syntax parsing is expressed in terms of these tokens, usually ignoring their parameters.

At this point, it should be clear that handling whitespace is nothing crazy: instead of dropping all whitespace (like a Java compiler would do), you could as easily lex the strings " " or \t to a token Indentation used in the grammar.

Context

StackExchange Computer Science Q#20181, answer score: 3

Revisions (0)

No revisions yet.