HiveBrain v1.2.0
Get Started
← Back to all entries
snippetMajor

How to create DFA from regular expression without using NFA?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
expressiondfawithoutnfacreateregularusinghowfrom

Problem

Objective is to create DFA from a regular expression and using "Regular exp>NFA>DFA conversion" is not an option. How should one go about doing that?

I asked this question to our professor but he told me that we can use intuition and kindly refused to provide any explanation. So I wanted to ask you.

"Regular exp>NFA>DFA conversion" is not an option because such a conversion takes a lot of time to convert a rather complex regular expression. For example, for a certain regex "regex>NFA>DFA" takes 1 hour for a human being. I need to convert regex to DFA in less than 30 minutes.

Solution

Since you want "to convert regex to DFA in less than 30 minutes", I suppose you are working by hand on relatively small examples.

In this case you can use Brzozowski's algorithm $[1]$, which computes directly the Nerode automaton of a language (which is known to be equal to its minimal deterministic automaton). It is based on a direct computation of the derivatives and it also works for extended regular expressions allowing intersection and complementation. The drawback of this algorithm is that it requires to check the equivalence of the expressions computed along the way, an expensive process. But in practice, and for small examples, it is very efficient.

Left quotients. Let $L$ be a language of $A^*$ and let $u$ be a word. Then
$$
u^{-1}L = \{v \in A^* \mid uv \in L \}
$$
The language $u^{-1}L$ is called a left quotient (or left derivative) of $L$.

Nerode automaton. The Nerode automaton of $L$ is the deterministic automaton $\mathcal{A}(L) = (Q, A, \cdot, L, F)$ where $Q = \{u^{-1}L \mid u \in A^*\}$, $F = \{u^{-1}L \mid u \in L\}$ and the transition function is defined, for each $a \in A$, by the formula
$$
(u^{-1}L)\cdot a = a^{-1}(u^{-1}L)=(ua)^{-1}L
$$
Beware of this rather abstract definition. Each state of $\mathcal{A}$ is a left quotient of $L$ by a word, and hence is a language of $A^*$. The initial state is the language $L$, and the set of final states is the set of all left quotients of $L$ by a word of $L$.

Brzozowski's algorithm. Let $a, b$ be letters. One can compute the left quotients using the following formulas:
\begin{align*}
a^{-1}1 &= 0 & a^{-1}b &= \begin{cases}
1 &\text{if $a = b$}\\
0 &\text{if $a \not= b$}\\
\end{cases}\\
a^{-1}(L_1 \cup L_2) &= a^{-1}L_1 \cup u^{-1}L_2,&
a^{-1}(L_1 \setminus L_2) &= a^{-1}L_1 \setminus u^{-1}L_2,\\
a^{-1}(L_1 \cap L_2) &= a^{-1}L_1 \cap u^{-1}L_2, &
a^{-1}L^ &= (a^{-1}L)L^
\end{align*}
\begin{align*}
a^{-1}(L_1L_2) &=
\begin{cases}
(a^{-1}L_1)L_2 &\text{si $1 \notin L_1$,}\\
(a^{-1}L_1)L_2 \cup a^{-1}L_2 &\text{si $1 \in L_1$}\\
\end{cases}\\
%\\v^{-1}(u^{-1}L) &= (uv)^{-1}L.
\end{align*}

Example.
For $L = (a(ab)^)^ \cup (ba)^*$, we get successively:
\begin{align*}
1^{-1}L &= L=L_1\\
a^{-1}L_1 &=(ab)^(a(ab)^)^*=L_2\\
b^{-1}L_1 &= a(ba)^*=L_3\\
a^{-1}L_2 &= b(ab)^(a(ab)^)^ \cup (ab)^(a(ab)^)^=bL_2 \cup L_2=L_4\\
b^{-1}L_2 &=\emptyset \\
a^{-1}L_3 &=(ba)^*=L_5\\
b^{-1}L_3 &=\emptyset \\
a^{-1}L_4 &= a^{-1}(bL_2 \cup L_2)=a^{-1}L_2=L_4 \\
b^{-1}L_4 &= b^{-1}(bL_2 \cup L_2)= L_2\cup b^{-1}L_2 = L_2 \\
a^{-1}L_5 &= \emptyset\\
b^{-1}L_5 &=a(ba)^*=L_3
\end{align*}
which gives the following minimal automaton.

$[1]$ J. Brzozowski, Derivatives of Regular Expressions, J.ACM 11(4), 481–494, 1964.

Edit. (April 5, 2015) I just discovered that a similar question: What algorithms exist for construction a DFA that recognizes the language described by a given regex? was asked on cstheory. The answer partly addresses complexity issues.

Context

StackExchange Computer Science Q#40819, answer score: 21

Revisions (0)

No revisions yet.