patternMinor
Dragon book exercise 4.4.5, why is aaaaaa not recognized by the recursive descent parser?
Viewed 0 times
whythebookparserexerciseaaaaaarecognizedrecursivedragonnot
Problem
S → aSa | aa
generates all even-length strings of a’s
We can devise a recursive descent parser with [backtrack] for this grammar.
If we choose to expand by production S → aa first,
then we shall only recognize the string aa.
Thus, try S → aSa first.
a) Show that this recursive descent parser recognizes
inputs aa, aaaa, and aaaaaaaa, but not aaaaaa.The matching goes as follows:
aaaaaa → aSa
aaaaa → aaSaa
aaaa → aaaSaaa
... until matching against
aSa fails whenaa → aaaaSaaaa
If we try
S → aa right now, it matches, but the extra aaaa after S have nothing to match against anymore and thus fail.So we backtrack a step and try to match
S → aa one step earlier, but again we have leftover aa.Backtrack even further, back to
aaSaa
When we try
S → aa at this level, the routine finishes with a successful match.Why then does the book say that
aaaaaa is not recognized by this parser?Solution
When you look at the parser's process, you see the entire text to be parsed, and naturally make judgements based on that knowledge. But that's not the way the parser works. So your strategy makes it easy for you to parse utterances, but not easy for you to understand the parser.
Remember that the question is about a recursive descent parser with backtracking. In a RD parser, each non-terminal has a parsing procedure; that procedure which attempts each possible production for that non-terminal in order. If a production matches, the procedure returns success. Otherwise, it resets the input cursor and tries the next production. If no production succeeds, it returns failure, and its caller will perform the next fallback.
A good way to actually understand a parser (in my opinion) is to write one in some concrete programming language. Or, if you are trying to educate a class, to encourage your students to do that. In this simple case, you should be able to write the parser in a few minutes. (Instrumenting it so that you can see what's going on takes only a little bit longer, but I'll leave that as an exercise.)
Here's a quick Python implementation which uses a seekable I/O object to allow fallback. (
So we can give it a quick try and see if the Dragon Book's question is valid:
As advertised, it works with $aa$, $aaaa$ and $aaaaaaaa$, and fails on $aaaaaa$. Since I ran it a bit extra, there's actually a pretty good hint about what language is being recognised here. (Interestingly, it's not a context-free language.)
OK, what's going on? Again, you could get some more information by instrumenting the parser and examining the log of calls and returns. I'm going to try to explain the failure to parse $aaaaaa$ in narrative, but if you have any doubts, do try adding logging to the sample code. (The call and result log is at the end of the question, but it is quite long.)
First of all, note that at no point does the parser know anything about what's coming in the input stream, not even how long it is. It only knows about the next character. So the first thing it does (since it always tries $ S\to a S a$ first) is to recursively call itself until that production fails, which will happen on the seventh recursion, not the third one. That's the point where the $S$ recogniser is called at the end of input. After the first alternative fails, that instance will then fall back to trying $S\to a a$, but since the parser has reached the end of input, that one will fail, too.
So then the seventh recursion reports failure back to the sixth recursion, which falls back to $S\to a a$. That also fails, because there is now only one available $a$. So now it falls back to the fifth recursion, which attempts its next alternative ($S\to a a$), which this time works because there are two $a$s at the current input.
That allows the fifth recursion to return success to its caller, the fourth recursion, which can then try to complete the $S\to a S a$ production by reading the last $a$. Alas, that doesn't work because after the $S$ succeeded, it left the input buffer at the end. So now the fourth recursion again resets the input cursor and tries its next alternative ($S\to a a$), which of course will work, so the fourth recursion can report success back to the third recursion.
Let's take stock here. The current call stack contains three recursions, each one waiting for $S$ to return, and a fourth recursion, which is about to report that it matched $S\to a a$ with the fourth and fifth input character. You and I can see that that's not the correct parse -- there aren't enough input characters left to satisfy the pending recursive calls -- but the parser is unaware of that.
The third recursion can read the last $a$, letting it satisfy $S\to a S a$ (covering the third through the sixth input character) so it returns success. And that's wrong, because the correct parse needs to match $S\to a a$ at the third position. But that correct parse will never be found, because once a matching production is found, the rest of the alternatives will never be tried.
Thus, when the third recursion returns to the second recursion, the second recursion finds that it can't complete $S\to a S a$. But it has no way of telling the third recursion to try some other alternatives. It doesn't even know that the third recursion had some untried alternatives. All it ca
Remember that the question is about a recursive descent parser with backtracking. In a RD parser, each non-terminal has a parsing procedure; that procedure which attempts each possible production for that non-terminal in order. If a production matches, the procedure returns success. Otherwise, it resets the input cursor and tries the next production. If no production succeeds, it returns failure, and its caller will perform the next fallback.
A good way to actually understand a parser (in my opinion) is to write one in some concrete programming language. Or, if you are trying to educate a class, to encourage your students to do that. In this simple case, you should be able to write the parser in a few minutes. (Instrumenting it so that you can see what's going on takes only a little bit longer, but I'll leave that as an exercise.)
Here's a quick Python implementation which uses a seekable I/O object to allow fallback. (
io.StringIO will work, for example.)def match_a(inp):
return inp.read(1) == 'a'
def match_end(inp):
return inp.read(1) == ''
def match_S(inp):
fallback = inp.tell()
if match_a(inp) and match_S(inp) and match_a(inp): return True
inp.seek(fallback)
if match_a(inp) and match_a(inp): return True
return False
def match(inp):
return match_s(inp) and match_end(inp)
So we can give it a quick try and see if the Dragon Book's question is valid:
for n in range(17):
print(n, match(io.StringIO('a' * n)))
0 False
1 False
2 True
3 False
4 True
5 False
6 False
7 False
8 True
9 False
10 False
11 False
12 False
13 False
14 False
15 False
16 True
As advertised, it works with $aa$, $aaaa$ and $aaaaaaaa$, and fails on $aaaaaa$. Since I ran it a bit extra, there's actually a pretty good hint about what language is being recognised here. (Interestingly, it's not a context-free language.)
OK, what's going on? Again, you could get some more information by instrumenting the parser and examining the log of calls and returns. I'm going to try to explain the failure to parse $aaaaaa$ in narrative, but if you have any doubts, do try adding logging to the sample code. (The call and result log is at the end of the question, but it is quite long.)
First of all, note that at no point does the parser know anything about what's coming in the input stream, not even how long it is. It only knows about the next character. So the first thing it does (since it always tries $ S\to a S a$ first) is to recursively call itself until that production fails, which will happen on the seventh recursion, not the third one. That's the point where the $S$ recogniser is called at the end of input. After the first alternative fails, that instance will then fall back to trying $S\to a a$, but since the parser has reached the end of input, that one will fail, too.
So then the seventh recursion reports failure back to the sixth recursion, which falls back to $S\to a a$. That also fails, because there is now only one available $a$. So now it falls back to the fifth recursion, which attempts its next alternative ($S\to a a$), which this time works because there are two $a$s at the current input.
That allows the fifth recursion to return success to its caller, the fourth recursion, which can then try to complete the $S\to a S a$ production by reading the last $a$. Alas, that doesn't work because after the $S$ succeeded, it left the input buffer at the end. So now the fourth recursion again resets the input cursor and tries its next alternative ($S\to a a$), which of course will work, so the fourth recursion can report success back to the third recursion.
Let's take stock here. The current call stack contains three recursions, each one waiting for $S$ to return, and a fourth recursion, which is about to report that it matched $S\to a a$ with the fourth and fifth input character. You and I can see that that's not the correct parse -- there aren't enough input characters left to satisfy the pending recursive calls -- but the parser is unaware of that.
The third recursion can read the last $a$, letting it satisfy $S\to a S a$ (covering the third through the sixth input character) so it returns success. And that's wrong, because the correct parse needs to match $S\to a a$ at the third position. But that correct parse will never be found, because once a matching production is found, the rest of the alternatives will never be tried.
Thus, when the third recursion returns to the second recursion, the second recursion finds that it can't complete $S\to a S a$. But it has no way of telling the third recursion to try some other alternatives. It doesn't even know that the third recursion had some untried alternatives. All it ca
Context
StackExchange Computer Science Q#143480, answer score: 4
Revisions (0)
No revisions yet.