patternMinor
Bottom-Up Parser With Leftmost Derivation
Viewed 0 times
parserwithbottomleftmostderivation
Problem
I'm reading the book Parsing Techniques by Dick Grune et al. and in section 3.1.3 "Linearization of the Parse Tree" they introduce the notion of linearization:
[...] a parser can produce a list of rule numbers instead, which means that it linearizes the parse tree. There are three main ways to linearize a tree, prefix, postfix and infix.
Those are explained using the following grammar:
and this particular production tree (the grammar is spuriously ambiguous):
(The numbers at the nodes are semantic attributes.)
Prefix notation is explained first:
In prefix notation, each node is listed by listing its number followed by prefix listings of the subnodes in left-to-right order; this gives us the leftmost derivation [...]:
leftmost: 2 2 1 3c 1 3e 1 3a
Postfix notation follows:
In postfix notation, each node is listed by listing in postfix notation all the subnodes in left-to-right order, followed by the number of the rule in the node itself; this gives us the rightmost derivation [...]:
rightmost: 3c 1 3e 1 2 3a 1 2
Before the linearizations, there is a "leftmost" and "rightmost," respectively. Does that mean prefix notation only works with top-down and postfix notation with bottom-up parsing? But then why? If we start from the rightmost non-terminal and build the parsing tree from the top down, wouldn't the linearization with the "leftmost" in front be the result? Aren't the notations semantically independent?
Furthermore, I've read about LL and LR parsers, which yield a leftmost or rightmost derivation and use a top-down or bottom-up algorithm, respectively. Does that mean a bottom-up parser can only work with a rightmost derivation? Why not with a leftmost one? I don't see the problem with that, similarly to how prefix notation seems to imply a leftmost derivation.
[...] a parser can produce a list of rule numbers instead, which means that it linearizes the parse tree. There are three main ways to linearize a tree, prefix, postfix and infix.
Those are explained using the following grammar:
and this particular production tree (the grammar is spuriously ambiguous):
(The numbers at the nodes are semantic attributes.)
Prefix notation is explained first:
In prefix notation, each node is listed by listing its number followed by prefix listings of the subnodes in left-to-right order; this gives us the leftmost derivation [...]:
leftmost: 2 2 1 3c 1 3e 1 3a
Postfix notation follows:
In postfix notation, each node is listed by listing in postfix notation all the subnodes in left-to-right order, followed by the number of the rule in the node itself; this gives us the rightmost derivation [...]:
rightmost: 3c 1 3e 1 2 3a 1 2
Before the linearizations, there is a "leftmost" and "rightmost," respectively. Does that mean prefix notation only works with top-down and postfix notation with bottom-up parsing? But then why? If we start from the rightmost non-terminal and build the parsing tree from the top down, wouldn't the linearization with the "leftmost" in front be the result? Aren't the notations semantically independent?
Furthermore, I've read about LL and LR parsers, which yield a leftmost or rightmost derivation and use a top-down or bottom-up algorithm, respectively. Does that mean a bottom-up parser can only work with a rightmost derivation? Why not with a leftmost one? I don't see the problem with that, similarly to how prefix notation seems to imply a leftmost derivation.
Solution
"Linearizing" here is equivalent to superimposing a total ordering on what is a partial order. If your grammar is context-free, this partial order can always be conveniently represented by a tree. Other types of grammars will be able to produce derivations orders that are not tree-like, and that is why we generally avoid them.
This total order is meant to allow for sequential algorithms to be written, because the input will be read sequentially. That's all, no magic here.
What happens is that different criteria for linearization will serve different algorithms: if you are going to reduce the input, you will go "from the bottom up", metaphorically, so the linear order that suits you (if you are reading the input left-to-right - because your language has an indo-european flavor), is postfix. If not, prefix.
This total order is meant to allow for sequential algorithms to be written, because the input will be read sequentially. That's all, no magic here.
What happens is that different criteria for linearization will serve different algorithms: if you are going to reduce the input, you will go "from the bottom up", metaphorically, so the linear order that suits you (if you are reading the input left-to-right - because your language has an indo-european flavor), is postfix. If not, prefix.
Context
StackExchange Computer Science Q#66234, answer score: 2
Revisions (0)
No revisions yet.