snippetMajor

How to prove that a grammar is unambiguous?

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

unambiguousprovegrammarthathow

Problem

My problem is how can I prove that a grammar is unambiguous?
I have the following grammar:
$$S
→ statement
∣ \mbox{if } expression \mbox{ then } S
∣ \mbox{if } expression \mbox{ then } S \mbox{ else } S$$

and make this to an unambiguous grammar, I think its correct:

-
$ S → S_1 ∣ S_2 $

-
$S_1
→ \mbox{if } expression \mbox{ then } S
∣ \mbox{if } expression \mbox{ then } S_2 \mbox{ else } S_1$

-
$S_2
→ \mbox{if } expression \mbox{ then } S_2 \mbox{ else } S_2
∣ statement$

I know that a unambiguous grammar has one parse tree for every term.

Solution

There is (at least) one way to prove unambiguity of a grammar $G = (N,T,\delta,S)$ for language $L$. It consists of two steps:

Prove $L \subseteq \mathcal{L}(G)$.

Prove $[z^n]S_G(z) = |L_n|$.

The first step is pretty clear: show that the grammar generates (at least) the words you want, that is correctness.

The second step shows that $G$ has as many syntax trees for words of length $n$ as $L$ has words of length $n$ -- with 1. this implies unambiguity. It uses the structure function of $G$ which goes back to Chomsky and Schützenberger [1], namely

$\qquad \displaystyle S_G(z) = \sum_{n=0}^\infty t_nz^n$

with $t_n = [z^n]S_G(z)$ the number of syntax trees $G$ has for words of length $n$. Of course you need to have $|L_n|$ for this to work.

The nice thing is that $S_G$ is (usually) easy to obtain for context-free languages, although finding a closed form for $t_n$ can be difficult. Transform $G$ into an equation system of functions with one variable per nonterminal:

$\qquad \displaystyle \left[ A(z) = \sum\limits_{(A, a_0 \dots a_k) \in \delta} \
\prod\limits_{i=0}^{k} \
\tau(a_i)\
: A \in N \right] \text{ with } \tau(a) = \begin{cases}
a(z) &, a \in N \\
z &, a \in T \\
\end{cases}.$

This may look daunting but is really only a syntactical transformation as will become clear in the example. The idea is that generated terminal symbols are counted in the exponent of $z$ and because the system has the same form as $G$, $z^n$ occurs as often in the sum as $n$ terminals can be generated by $G$. Check Kuich [2] for details.

Solving this equation system (computer algebra!) yields $S(z) = S_G(z)$; now you "only" have to pull the coefficient (in closed, general form). The TCS Cheat Sheet and computer algebra can often do so.

Example

Consider the simple grammar $G$ with rules

$\qquad \displaystyle S \to aSa \mid bSb \mid \varepsilon$.

It is clear that $\mathcal{L}(G) = \{ww^R \mid w \in \{a,b\}^*\}$ (step 1, proof by induction). There are $2^{\frac{n}{2}}$ palindromes of length $n$ if $n$ is even, $0$ otherwise.

Setting up the equation system yields

$\qquad \displaystyle S(z) = 2z^2S(z) + 1$

whose solution is

$\qquad \displaystyle S_G(z) = \frac{1}{1-2z^2}$.

The coefficients of $S_G$ coincide with the numbers of palindromes, so $G$ is unambiguous.

The Algebraic Theory of Context-Free Languages by Chomsky, Schützenberger (1963)

On the entropy of context-free languages by Kuich (1970)

Context

StackExchange Computer Science Q#2320, answer score: 27

Revisions (0)

No revisions yet.