patternMinor
Leo's deterministic reduction for Earley Parsing
Viewed 0 times
leoreductionfordeterministicparsingearley
Problem
I am trying to build an Earley parser using the Wikipedia pesudocode as a base, with Aycock's fix for epsilon rules as follows:
And Leo's optimization for right recursion using the explanation here for further insight.
I am able to understand how to parse Right Recursive grammars with it. However, I am confused when it comes to using it to parse non-right recursive grammars.
For example, here is a grammar that can parse an example string
Without the deterministic reduction from Leo, the Earley items constructed (pipe indicates the dot) are as follows:
The parenthesis is in this format (starting column, earley set).
According to Leo,
Definition 2.1: An item is said to be on the deterministic reduction path above $ [A \rightarrow \gamma., i] $ if it is $ [B
> \rightarrow \alpha A ., k] $ with $ [B \rightarrow \alpha . A, k] $
being the only item in $ I_i $ with the dot in front of A, or if it is
on the deterministic reduction path above $ [B \rightarrow \alpha A .,
> k] $. An item on such a path is called topmost one if there is no
item on the deterministic reduction path above it.
(I am assuming t
def predict(self, column, symbol, state):
# original predict
for alternative in grammar[symbol]:
column.add(a new state corresponding to alternative)
# Aycock fix
if nullable(symbol):
column.add(state.advance())And Leo's optimization for right recursion using the explanation here for further insight.
I am able to understand how to parse Right Recursive grammars with it. However, I am confused when it comes to using it to parse non-right recursive grammars.
For example, here is a grammar that can parse an example string
aa :=
:= + 'a'
:= \epsilonWithout the deterministic reduction from Leo, the Earley items constructed (pipe indicates the dot) are as follows:
The parenthesis is in this format (starting column, earley set).
char: _ column[0]
:= | (0,0) #1) init
:= | a(0,0) #2) predict from 1
:= |(0,0) #3) predict from 1
:= |(0,0) #4) epsilon complete from 3
:= | a(0,0) #5) epsilon complete from 3
char: a column[1]
:= a |(0,1) #6) scan a from 5
:= |(0,1) #7) complete A from 1 (start col of 6)
:= | a(0,1) #8) complete A from 2 (start col of 6)
char: a column[2]
:= a |(0,2) #9) scan a from 8
:= |(0,2) #10) complete A from 1 (start col of 9)
:= | a(0,2) #11) complete A from 2 (start col of 9)According to Leo,
Definition 2.1: An item is said to be on the deterministic reduction path above $ [A \rightarrow \gamma., i] $ if it is $ [B
> \rightarrow \alpha A ., k] $ with $ [B \rightarrow \alpha . A, k] $
being the only item in $ I_i $ with the dot in front of A, or if it is
on the deterministic reduction path above $ [B \rightarrow \alpha A .,
> k] $. An item on such a path is called topmost one if there is no
item on the deterministic reduction path above it.
(I am assuming t
Solution
I found the problem: Essentially, I misunderstood the definition. It says that
it is [B→αA.,k] with [B→α.A,k] being the only item in Ii with the dot
in front of A, or if it is on the deterministic reduction path above
[B→αA.,k].
Which should be interpreted as
(The "Parsing Techniques A Practical Guide" by Grune et al. from 2008 is the book I refer to here.)
it is [B→αA.,k] with [B→α.A,k] being the only item in Ii with the dot
in front of A, or if it is on the deterministic reduction path above
[B→αA.,k].
Which should be interpreted as
[B→αA.,k] should be considered to be created corresponding to [B→α.A,k]. The basic idea which seems to be elided in Leo's paper but given in Grune et al. (2008) is that, the deterministic reduction is basically simply an eager completion with all intermediate results thrown away. With this change, the parser works correctly.(The "Parsing Techniques A Practical Guide" by Grune et al. from 2008 is the book I refer to here.)
Context
StackExchange Computer Science Q#101770, answer score: 2
Revisions (0)
No revisions yet.