snippetMinor

How to convert a non-embedding context free grammar to regular grammar?

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

formal-languages formal-grammars cs stackoverflow regular-languages context-free

nonconvertfreeregulargrammarembeddinghowcontext

Problem

A context-free grammar is said to be self-embedding if and only
if contains a derivation of the form $\xi\stackrel*\Rightarrow u\xi v$, where $\xi$ is a non-terminal and $u$, $v$ are some non-empty words.

Please note that I am aware the undecidability of the conversion of context-free grammar to regular grammar. But given an input context-free grammar that is not self-embedding, is there any algorithm to convert it to regular grammar, or DFA directly?

Solution

Check out this link.
This is a simpler version of a proof by Chomsky that NSE grammars represent regular languages. Fortunately, the proof technique illustrates how to construct a left-regular grammar from a given NSE grammar. Here's my explanation:

For each of the $\lvert V \rvert (\lvert V \rvert +1)/2$ pairs $(v_1, v_2)$ of elements of $V$, decide whether $v_1 \le v_2$ based on the definition given: $v_1 \le v_2$ if $v_2 := V^ v_1 V^*$.

Construct equivalence classes such that if $v_1 \le v_2$ and $v_2 \le v_1$, $v_1$ and $v_2$ are in the same equivalence class. This will be a partition of all elements of $V$ into one or more equivalence classes.

Now, for each pair $(VE_1, VE_2)$ of equivalence classes of $V$ as described above, determine whether $VE_1 \le VE_2$ by checking whether $v_1$ in $VE_1 \le v_2$ in $VE_2$.

Construct sets $UE$ corresponding to equivalence classes $VE$ such that if $VE_i \le VE_k$, $VE_i$ is a subset of $UE_k$. Each $UE_k$ must also contain the alphabet set $E$. You will have one $UE$ for each $VE$, and each $UE$ will contain variables in equivalence classes "less than" the corresponding $VE$, in addition to all the alphabet symbols.

Determine $P(v)$ for each variable $v$ as follows: $P(v)$ is the set of all productions in $P$ whose left-hand-sides belong to the equivalence class containing variable $v$.

For each variable $v$, construct a grammar as follows: $G(v) = (VE \cup UE, UE, P(v), v)$, where $VE$ is the equivalence class containing $v$ and the $UE$ is the one corresponding to the $VE$.

The authors have proven a lemma which claims that $G(v)$ is a linear grammar. From this, we can write regular expressions over $UE$ for each variable $v$. Note that for the $UE$ corresponding to the "smallest" $VE$, this regular expression will contain only symbols from the original alphabet $E$.

Iteratively substitute regular expressions containing only alphabet symbols into more complicated regular expressions obtained from step 7. Eventually, you will have a regular expression corresponding to the language generated from the original start symbol $S$, and this regular expression will contain only alphabet symbols from the original alphabet.

You now have a regular expression for the NSE grammar, and can obtain a minimal DFA using Kleene's theorem, the subset construction, and a DFA minimization algorithm.

If you would like an example, I can try to provide one later. Try to do a few yourself, read the paper (it's short), and we can talk about complexity later on.

Context

StackExchange Computer Science Q#2774, answer score: 5

Revisions (0)

No revisions yet.