HiveBrain v1.2.0
Get Started
← Back to all entries
snippetMinor

How to convert a non-embedding context free grammar to regular grammar?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
nonconvertfreeregulargrammarembeddinghowcontext

Problem

A context-free grammar is said to be self-embedding if and only
if contains a derivation of the form $\xi\stackrel*\Rightarrow u\xi v$, where $\xi$ is a non-terminal and $u$, $v$ are some non-empty words.

Please note that I am aware the undecidability of the conversion of context-free grammar to regular grammar. But given an input context-free grammar that is not self-embedding, is there any algorithm to convert it to regular grammar, or DFA directly?

Solution

Check out this link.
This is a simpler version of a proof by Chomsky that NSE grammars represent regular languages. Fortunately, the proof technique illustrates how to construct a left-regular grammar from a given NSE grammar. Here's my explanation:

  • For each of the $\lvert V \rvert (\lvert V \rvert +1)/2$ pairs $(v_1, v_2)$ of elements of $V$, decide whether $v_1 \le v_2$ based on the definition given: $v_1 \le v_2$ if $v_2 := V^ v_1 V^*$.



  • Construct equivalence classes such that if $v_1 \le v_2$ and $v_2 \le v_1$, $v_1$ and $v_2$ are in the same equivalence class. This will be a partition of all elements of $V$ into one or more equivalence classes.



  • Now, for each pair $(VE_1, VE_2)$ of equivalence classes of $V$ as described above, determine whether $VE_1 \le VE_2$ by checking whether $v_1$ in $VE_1 \le v_2$ in $VE_2$.



  • Construct sets $UE$ corresponding to equivalence classes $VE$ such that if $VE_i \le VE_k$, $VE_i$ is a subset of $UE_k$. Each $UE_k$ must also contain the alphabet set $E$. You will have one $UE$ for each $VE$, and each $UE$ will contain variables in equivalence classes "less than" the corresponding $VE$, in addition to all the alphabet symbols.



  • Determine $P(v)$ for each variable $v$ as follows: $P(v)$ is the set of all productions in $P$ whose left-hand-sides belong to the equivalence class containing variable $v$.



  • For each variable $v$, construct a grammar as follows: $G(v) = (VE \cup UE, UE, P(v), v)$, where $VE$ is the equivalence class containing $v$ and the $UE$ is the one corresponding to the $VE$.



  • The authors have proven a lemma which claims that $G(v)$ is a linear grammar. From this, we can write regular expressions over $UE$ for each variable $v$. Note that for the $UE$ corresponding to the "smallest" $VE$, this regular expression will contain only symbols from the original alphabet $E$.



  • Iteratively substitute regular expressions containing only alphabet symbols into more complicated regular expressions obtained from step 7. Eventually, you will have a regular expression corresponding to the language generated from the original start symbol $S$, and this regular expression will contain only alphabet symbols from the original alphabet.



  • You now have a regular expression for the NSE grammar, and can obtain a minimal DFA using Kleene's theorem, the subset construction, and a DFA minimization algorithm.



If you would like an example, I can try to provide one later. Try to do a few yourself, read the paper (it's short), and we can talk about complexity later on.

Context

StackExchange Computer Science Q#2774, answer score: 5

Revisions (0)

No revisions yet.