HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

How much bigger can an LR(1) automaton for a language be than the corresponding LR(0) automaton?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
canthemuchbiggerthanlanguageforhowcorrespondingautomaton

Problem

In an LR(0) parser, each state consists of a collection of LR(0) items, which are productions annotated with a position. In an LR(1) parser, each state consists of a collection of LR(1) items, which are productions annotated with a position and a lookahead character.

It's known that given a state in an LR(1) automaton, the configurating set formed by dropping the lookahead tokens from each LR(1) item yields a configurating set corresponding to some state in the LR(0) automaton. In that sense, the main difference between an LR(1) automaton and an LR(0) automaton is that the LR(1) automaton has more copies of the states in the LR(0) automaton, each of which is annotated with lookahead information. For this reason, LR(1) automata for a given CFG are typically larger than the corresponding LR(0) parser for that CFG.

My question is how much larger the LR(1) automaton can be. If there are $n$ distinct terminal symbols in the alphabet of the grammar, then in principle we might need to replicate each state in the LR(0) automaton at least once per subset of those $n$ distinct terminal symbols, potentially leading to an LR(1) automaton that's $2^n$ times larger than the original LR(0) automaton. Given that each individual item in the LR(0) automaton consists of a set of different LR(0) items, we may get an even larger blowup.

That said, I can't seem to find a way to construct a family of grammars for which the LR(1) automaton is significantly larger than the corresponding LR(0) automaton. Everything I've tried has led to a modest increase in size (usually around 2-4x), but I can't seem to find a pattern that leads to a large blowup.

Are there known families of context-free grammars whose LR(1) automata are exponentially larger than the corresponding LR(0) automata? Or is it known that in the worst case, you can't actually get an exponential blowup?

Thanks!

Solution

The grammar

$$\begin{array}{l}
S \rightarrow T_0 \\
T_n \rightarrow a \; T_{n+1} \\
T_n \rightarrow b \; T_{n+1} \\
T_n \rightarrow b \; T_{n+1} \; t_n \\
T_N \rightarrow t_N
\end{array}
$$

has the LR(0) state
$$T_N \rightarrow t_N \dot \\$$
expanded to $2^N$ variants in the LR(1) automata as all the partitions of $\{t_0 \dots t_{N-1}\}$ are possible look-head which appear in different contexts. The number of states in the LR(0) automaton on the other hand is linear in term of $N$. Thus an expansion factor of the order of $2^N/N$ is possible.

Edit: I'll have to check later when I've more time, I think adding $T_N \rightarrow T_0$ would give the exponential factor on nearly all the LR(0) states. That result in a shift-reduce conflict.

Context

StackExchange Computer Science Q#44096, answer score: 4

Revisions (0)

No revisions yet.