patternMinor
Computing follow sets conservatively for a PEG grammar
Viewed 0 times
grammarforconservativelyfollowsetspegcomputing
Problem
Given a parsing expression grammar (PEG) grammar and the name of the start production, I would like to label each node with the set of characters that can follow it. I would be happy with a good approximation that is conservative -- if a character can follow a node then it must appear in the follower set.
The grammar is represented as a tree of named productions whose bodies contain nodes representing
So given a grammar in ABNF style syntax:
where adjacent nodes are concatenated,
If the grammar's start production is
I want this so that I can do some simplification on a PEG grammar. Since order is important in unions in PEG grammars, I want to partition the members of unions based on which ones could accept the same character so that I can ignore order between partition elements.
I'm using OMeta's grow-the-seed scheme for handling direct left-recursion in PEG grammars, so I need something that handles that. I expect that any scheme for handling scannerless CF grammars with order-independent unions that is conservative or correct would be conservative for my purposes.
Pointers to algorithms or source code would be much appreciated.
The grammar is represented as a tree of named productions whose bodies contain nodes representing
- Character
- Concatenation
- Union
- Non-terminal references
So given a grammar in ABNF style syntax:
A := B ('a' | 'b');
B := ('c' | 'd') (B | ());where adjacent nodes are concatenated,
| indicates union, single quoted characters match the character they represent, and upper case names are non-terminals.If the grammar's start production is
A, the annotated version might look likeA :=
(
(B /* [ab] */)
(
('a' /* eof */)
|
('b' /* eof */)
/* eof */
)
/* eof */
);
B :=
(
(
('c' /* [abcd] */)
|
('d' /* [abcd] */)
/* [abcd] */
)
(
(B /* [ab] */)
|
( /* [ab] */)
/* [ab] */
)
);I want this so that I can do some simplification on a PEG grammar. Since order is important in unions in PEG grammars, I want to partition the members of unions based on which ones could accept the same character so that I can ignore order between partition elements.
I'm using OMeta's grow-the-seed scheme for handling direct left-recursion in PEG grammars, so I need something that handles that. I expect that any scheme for handling scannerless CF grammars with order-independent unions that is conservative or correct would be conservative for my purposes.
Pointers to algorithms or source code would be much appreciated.
Solution
Follow or lookahead sets are a standard concept in compiler theory. I would expect every textbook on the topic to contain an algorithm for them; as for online resources, try this or that one. It seems as if the choice of algorithm depended on what kind of grammar -- LL, LR, LALR, ... -- you have.
Context
StackExchange Computer Science Q#2292, answer score: 5
Revisions (0)
No revisions yet.