HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Computing follow sets conservatively for a PEG grammar

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
grammarforconservativelyfollowsetspegcomputing

Problem

Given a parsing expression grammar (PEG) grammar and the name of the start production, I would like to label each node with the set of characters that can follow it. I would be happy with a good approximation that is conservative -- if a character can follow a node then it must appear in the follower set.

The grammar is represented as a tree of named productions whose bodies contain nodes representing

  • Character



  • Concatenation



  • Union



  • Non-terminal references



So given a grammar in ABNF style syntax:

A := B ('a' | 'b');
B := ('c' | 'd') (B | ());


where adjacent nodes are concatenated, | indicates union, single quoted characters match the character they represent, and upper case names are non-terminals.

If the grammar's start production is A, the annotated version might look like

A := 
  (
    (B /* [ab] */)
    (
      ('a' /* eof */)
    | 
      ('b' /* eof */)
    /* eof */
    )
  /* eof */
  );

B :=
  (
    (
      ('c' /* [abcd] */)
      |
      ('d' /* [abcd] */)
    /* [abcd] */
    )
    (
      (B /* [ab] */)
      |
      ( /* [ab] */)
    /* [ab] */
    )
  );


I want this so that I can do some simplification on a PEG grammar. Since order is important in unions in PEG grammars, I want to partition the members of unions based on which ones could accept the same character so that I can ignore order between partition elements.

I'm using OMeta's grow-the-seed scheme for handling direct left-recursion in PEG grammars, so I need something that handles that. I expect that any scheme for handling scannerless CF grammars with order-independent unions that is conservative or correct would be conservative for my purposes.

Pointers to algorithms or source code would be much appreciated.

Solution

Follow or lookahead sets are a standard concept in compiler theory. I would expect every textbook on the topic to contain an algorithm for them; as for online resources, try this or that one. It seems as if the choice of algorithm depended on what kind of grammar -- LL, LR, LALR, ... -- you have.

Context

StackExchange Computer Science Q#2292, answer score: 5

Revisions (0)

No revisions yet.