HiveBrain v1.2.0
Get Started
← Back to all entries
snippetMinor

How to define at least one occurrence of a string between two tokens in bottom up LALR(1) parser grammar

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
howparserlalrtokensbottomonebetweentwodefineleast

Problem

I am trying to define a non terminal symbol in a LALR(1) grammar (with CUP parser). It is requested that

the  token must appear exactly twice, 
while the  token must appear at least once.


In the end I came up with this definition:

section     ::= hour_l CODE SC hour_l CODE SC hour_l ;
hour_l      ::= /* epsilon */ 
            | hour_l HOUR SC ;


where SC is a separator (semicolon) between tokens and hour_l is the non terminal symbol for hour's list.
This solution has a problem: HOUR can be absent, because epsilon can be reduced from hour_l.
Is there a clever solution other than specifying all possibilities or using the semantic capabilities of CUP (ie. putting a counter of how many times HOUR is present in section)? I'd prefer not to use semantics in order to achieve this; in fact, it seems to me this is syntax related.

Thanks

Solution

My solution, suggested by a friend, is to use a Finite State Machine. I drew a Deterministic Finite Automata, and $C$ is the final state accepted by this machine:

I then transformed it into a right regular grammar:

section     ::= c ;
a           ::= CODE SC ;
b           ::= a CODE SC ;
c           ::= c HOUR SC | b HOUR SC | e CODE SC ;
d           ::= HOUR SC | d HOUR SC ;
e           ::= e HOUR SC | a HOUR SC | d CODE SC ;


Hope it helps.

Code Snippets

section     ::= c ;
a           ::= CODE SC ;
b           ::= a CODE SC ;
c           ::= c HOUR SC | b HOUR SC | e CODE SC ;
d           ::= HOUR SC | d HOUR SC ;
e           ::= e HOUR SC | a HOUR SC | d CODE SC ;

Context

StackExchange Computer Science Q#9961, answer score: 2

Revisions (0)

No revisions yet.