patternModerate
Is Python a context-free language?
Viewed 0 times
contextpythonlanguagefree
Problem
From Wikipedia: Off-side_rule#Implementation, there is a statement:
...This requires that the lexer hold state, namely the current
indentation level, and thus can detect changes in indentation when
this changes, and thus the lexical grammar is not context-free –
INDENT/DEDENT depend on the contextual information of previous
indentation level.
I'm a bit confused. Some lexical grammars of Python is not context-free but INDENT/DEDENT themselves can be expressed in CFG. So lastly, Python is a context-free language or not?
...This requires that the lexer hold state, namely the current
indentation level, and thus can detect changes in indentation when
this changes, and thus the lexical grammar is not context-free –
INDENT/DEDENT depend on the contextual information of previous
indentation level.
I'm a bit confused. Some lexical grammars of Python is not context-free but INDENT/DEDENT themselves can be expressed in CFG. So lastly, Python is a context-free language or not?
Solution
Context-free grammars cannot express the rules of INDENT/DEDENT and so Python (which we use today in practice with INDENTs/DEDENTs)is not pure CF. Parsers (or lexical analyzers or lexers) for these languages use additional techniques to handle those structures. For example keep track of indentation levels, or tokenizer (scanners) may count number whitespaces and store that info in a table for later use. This is similar to something like trying to write a parser for the language $a^nb^{2n}c^n$, where you must keep track of value of $n$.
A simple example of adding additional power to a lexer is removing redundant whitespace symbols (space/tabs/newlines) or removing comments which has no meaning for the parser (parsing a CFL).
As another example, roughly it can be described as following. Assume a grammar $a^nb^mc^k$ which is clearly CFG. You use BISON/YACC like tools to generate a parser for this grammar. Then you can introduce an additional rule like number of $a$s must be equal to the number of $c$ which makes the language non CF. So you just add extra logic (manually coding) to your parser to handle that rule by means of, say, annotated trees. Your original language is still context free but extra piece of code you added manually allows it to accept non CF subset of the original language.
I don't think that such parsers are generated automatically by compiler tools (at least I don't know) and so that ("non context free") part must be coded manually.
A simple example of adding additional power to a lexer is removing redundant whitespace symbols (space/tabs/newlines) or removing comments which has no meaning for the parser (parsing a CFL).
As another example, roughly it can be described as following. Assume a grammar $a^nb^mc^k$ which is clearly CFG. You use BISON/YACC like tools to generate a parser for this grammar. Then you can introduce an additional rule like number of $a$s must be equal to the number of $c$ which makes the language non CF. So you just add extra logic (manually coding) to your parser to handle that rule by means of, say, annotated trees. Your original language is still context free but extra piece of code you added manually allows it to accept non CF subset of the original language.
I don't think that such parsers are generated automatically by compiler tools (at least I don't know) and so that ("non context free") part must be coded manually.
Context
StackExchange Computer Science Q#77989, answer score: 19
Revisions (0)
No revisions yet.