patternMinor
Indentation based Grammars
Viewed 0 times
grammarsbasedindentation
Problem
Considering programming languages with significant whitespace for indentation, such as Python or Haskell. How does this whitespace fit into the grand schemes of programming language grammars.
I can see that a "preprocessor" can convert indent changes into tokens which are then handled in grammars, yet can one also perform the parsing based upon a single grammar without changing the intrinsic complexity of the 2nd grammer?
I can see that a "preprocessor" can convert indent changes into tokens which are then handled in grammars, yet can one also perform the parsing based upon a single grammar without changing the intrinsic complexity of the 2nd grammer?
Solution
Any modern compiler consists of multiple phases, at least
Additional phases are static analysis (names, types, ...) and optimisations.
Each phase uses the result of the prior as input.
Now, the lexer processes the actual program text and transforms it into a token stream. For instance, the (sub)string
At this point, it should be clear that handling whitespace is nothing crazy: instead of dropping all whitespace (like a Java compiler would do), you could as easily lex the strings
- lexing,
- (context-free) parsing and
- target code generation.
Additional phases are static analysis (names, types, ...) and optimisations.
Each phase uses the result of the prior as input.
Now, the lexer processes the actual program text and transforms it into a token stream. For instance, the (sub)string
"abc" becomes a token StringLiteral("abc"), if becomes IF and myMethod becomes Identifier("myMethod"). Tokens typically hold additional information, e.g. their position in the original program text. The context-free grammar used for syntax parsing is expressed in terms of these tokens, usually ignoring their parameters.At this point, it should be clear that handling whitespace is nothing crazy: instead of dropping all whitespace (like a Java compiler would do), you could as easily lex the strings
" " or \t to a token Indentation used in the grammar.Context
StackExchange Computer Science Q#20181, answer score: 3
Revisions (0)
No revisions yet.