HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Leo's deterministic reduction for Earley Parsing

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
leoreductionfordeterministicparsingearley

Problem

I am trying to build an Earley parser using the Wikipedia pesudocode as a base, with Aycock's fix for epsilon rules as follows:

def predict(self, column, symbol, state):
    # original predict
    for alternative in grammar[symbol]:
        column.add(a new state corresponding to alternative)
    # Aycock fix
    if nullable(symbol):
        column.add(state.advance())


And Leo's optimization for right recursion using the explanation here for further insight.
I am able to understand how to parse Right Recursive grammars with it. However, I am confused when it comes to using it to parse non-right recursive grammars.
For example, here is a grammar that can parse an example string aa

 := 
         :=  + 'a'
         := \epsilon


Without the deterministic reduction from Leo, the Earley items constructed (pipe indicates the dot) are as follows:
The parenthesis is in this format (starting column, earley set).

char: _ column[0]
:= | (0,0)     #1) init
:= |  a(0,0)       #2) predict from 1
:= |(0,0)             #3) predict from 1
:=  |(0,0)     #4) epsilon complete from 3
:=  | a(0,0)       #5) epsilon complete from 3 

char: a column[1]
:=  a |(0,1)       #6) scan a from 5
:=  |(0,1)     #7) complete A from 1 (start col of 6)
:=  | a(0,1)       #8) complete A from 2 (start col of 6)

char: a column[2]
:=  a |(0,2)       #9) scan a from 8
:=  |(0,2)     #10) complete A from 1 (start col of 9)
:=  | a(0,2)       #11) complete A from 2 (start col of 9)


According to Leo,


Definition 2.1: An item is said to be on the deterministic reduction path above $ [A \rightarrow \gamma., i] $ if it is $ [B
> \rightarrow \alpha A ., k] $ with $ [B \rightarrow \alpha . A, k] $
being the only item in $ I_i $ with the dot in front of A, or if it is
on the deterministic reduction path above $ [B \rightarrow \alpha A .,
> k] $. An item on such a path is called topmost one if there is no
item on the deterministic reduction path above it.

(I am assuming t

Solution

I found the problem: Essentially, I misunderstood the definition. It says that


it is [B→αA.,k] with [B→α.A,k] being the only item in Ii with the dot
in front of A, or if it is on the deterministic reduction path above
[B→αA.,k].

Which should be interpreted as [B→αA.,k] should be considered to be created corresponding to [B→α.A,k]. The basic idea which seems to be elided in Leo's paper but given in Grune et al. (2008) is that, the deterministic reduction is basically simply an eager completion with all intermediate results thrown away. With this change, the parser works correctly.

(The "Parsing Techniques A Practical Guide" by Grune et al. from 2008 is the book I refer to here.)

Context

StackExchange Computer Science Q#101770, answer score: 2

Revisions (0)

No revisions yet.