HiveBrain v1.2.0
Get Started
← Back to all entries
snippetMinor

Parser theory: How to systematically compute FOLLOW sets?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
computeparsertheorysystematicallyfollowhowsets

Problem

Forgive me for my ignorance as I am self-teaching myself some of this theory... I am having some trouble understanding how to systematically/algorithmically compute FOLLOW sets, given that I have computed the FIRST sets.

For the following grammar:

E -> TE'
E' -> +TE' | ε
T -> FT'
T' -> *FT' |  ε
F -> (E) | id


I found that the FIRST sets are:

FIRST(E) = { (, id }
FIRST(E') = { + ,  ε }
FIRST(T) = { (, id }
FIRST(T') = { *,  ε }
FIRST(F) = { (, id }


When it comes to computing the FOLLOW sets, I am trying to use the following rules:

1. $ goes into the set FOLLOW(S) where S is the start symbol.
2. If A -> aBb, then (FIRST(b) - { ε }) is in FOLLOW(B).
3. If A -> aB then everything in FOLLOW(A) is in FOLLOW(B)
4. If A -> aBb and ε is in FIRST(b), then everything in FOLLOW(A) is in FOLLOW(B).


Obviously, the step I am most certain here is that

FOLLOW(E) contains $, based on rule 1. But everything after that, I seem to have a hard time following a pattern on how to systematically compute the FOLLOW sets. For example, to compute FOLLOW(E), how are we supposed to approach this?

E -> TE', looks like A -> aB , so everything in FOLLOW(E) is in FOLLOW(E'). Where to go from here? Do I compute FOLLOW(E')? Then going to FOLLOW(E'):

E' -> + TE'... looks like A -> aBb. Then that means FIRST(E') - { ε } is in FOLLOW(T). So I do know that { + } is in FOLLOW(T).

As you can see, with the way I have interpreted this, I am going around all over the place computing the FOLLOW of another production. Ultimately, I end up losing track on what terminal symbols go into which follow set. :(

I am for sure misunderstanding something here.

What is the right approach/ordering to compute the FOLLOW sets?

Solution

The simple way to do it is the same as the simple way to compute FIRST sets, using a least fixed-point algorithm in which a set of inclusion rules are applied repeatedly until a fixed-point is reached:

  • Initialise all the sets to empty.



  • For every production in order, use the rules to add elements into the sets.


(As a slight efficiency, rule 2 only needs to be applied on the first pass. I didn't apply that to the trace below.)

  • If any set was modified during step 2, do it again.



So, here are the productions in order (I added the missing $*$ to $T'$, and inserted an augmented start rule $S$, which avoids the need for your special case rule 1):

$$\begin{align}
S &\to E\; $ \\
E &\to TE' \\
E' &\to +TE' \mid \epsilon \\
T &\to FT' \\
T' &\to *FT' \mid \epsilon \\
F &\to (E) \mid id
\end{align}$$

  • Set all the $FOLLOW$ sets to $\{\}$



  • First pass. I left out productions like $F\to id$ which don't match any of the rules.


$$ \begin{array}{c|l|l}
S \to E\; $ &\text{Add }\$\text{ to }FOLLOW(E) &FOLLOW(E) = \{ \$ \}\\
E \to TE' &\text{Add }FIRST(E')-\{\epsilon\}\text{ to }FOLLOW(T)
&FOLLOW(T) = \{ + \}\\
&\text{Add }FOLLOW(E)\text{ to }FOLLOW(E')
&FOLLOW(E') = \{ \$ \}\\
&\text{Add }FOLLOW(E)\text{ to }FOLLOW(T)
&FOLLOW(T) = \{ + \$ \}\\
E' \to +TE' &\text{Add }FIRST(E')-\{\epsilon\}\text{ to }FOLLOW(T)
&\text{no change}\\
&\text{Add }FOLLOW(E')\text{ to }FOLLOW(E')
&\text{no change}\\
&\text{Add }FOLLOW(E')\text{ to }FOLLOW(T)
&\text{no change}\\
T \to FT' &\text{Add }FIRST(T')-\{\epsilon\}\text{ to }FOLLOW(F)
&FOLLOW(F) = \{ * \}\\
&\text{Add }FOLLOW(T)\text{ to }FOLLOW(T')
&FOLLOW(T') = \{ + \$ \}\\
T' \to *FT' &\text{Add }FIRST(T')-\{\epsilon\}\text{ to }FOLLOW(F)
&\text{no change}\\
&\text{Add }FOLLOW(T')\text{ to }FOLLOW(T')
&\text{no change}\\
&\text{Add }FOLLOW(T')\text{ to }FOLLOW(F)
&FOLLOW(F) = \{ * + \$\}\\
F \to (E) &\text{Add })\text{ to }FOLLOW(E) &FOLLOW(E) = \{ ) \$\}\\
\end{array} $$

  • There were various changes, so we repeat step 2.



  • (Step 2, second pass)


$$\begin{array}{c|l|l}
S \to E\; $ &\text{Add }\$\text{ to }FOLLOW(E) &\text{no change}\\
E \to TE' &\text{Add }FIRST(E')-\{\epsilon\}\text{ to }FOLLOW(T)
&\text{no change}\\
&\text{Add }FOLLOW(E)\text{ to }FOLLOW(E')
&FOLLOW(E') = \{ )\$ \}\\
&\text{Add }FOLLOW(E)\text{ to }FOLLOW(T)
&FOLLOW(T) = \{ + )\$ \}\\
E' \to +TE' &\text{Add }FIRST(E')-\{\epsilon\}\text{ to }FOLLOW(T)
&\text{no change}\\
&\text{Add }FOLLOW(E')\text{ to }FOLLOW(E')
&\text{no change}\\
&\text{Add }FOLLOW(E')\text{ to }FOLLOW(T)
&\text{no change}\\
T \to FT' &\text{Add }FIRST(T')-\{\epsilon\}\text{ to }FOLLOW(F)
&\text{no change}\\
&\text{Add }FOLLOW(T)\text{ to }FOLLOW(T')
&FOLLOW(T') = \{ + ) \$ \}\\
T' \to *FT' &\text{Add }FIRST(T')-\{\epsilon\}\text{ to }FOLLOW(F)
&\text{no change}\\
&\text{Add }FOLLOW(T')\text{ to }FOLLOW(T')
&\text{no change}\\
&\text{Add }FOLLOW(T')\text{ to }FOLLOW(F)
&FOLLOW(F) = \{ * + ) \$\}\\
F \to (E) &\text{Add })\text{ to }FOLLOW(E) &\text{no change}\\
\end{array} $$

  • (step 3, second pass) There were some more changes. Once again.



  • (Step 2, third pass)


$$\begin{array}{c|l|l}
S \to E\; $ &\text{Add }\$\text{ to }FOLLOW(E) &\text{no change}\\
E \to TE' &\text{Add }FIRST(E')-\{\epsilon\}\text{ to }FOLLOW(T)
&\text{no change}\\
&\text{Add }FOLLOW(E)\text{ to }FOLLOW(E')
&\text{no change}\\
&\text{Add }FOLLOW(E)\text{ to }FOLLOW(T)

Context

StackExchange Computer Science Q#101450, answer score: 4

Revisions (0)

No revisions yet.