patternMinor
On computing the factor-free basis of a regular language
Viewed 0 times
thebasisfreeregularlanguagefactorcomputing
Problem
The problem is as follows.
Background: Given an alphabet $\varSigma$. If $u,v,w,x \in \varSigma^$ and $w=uxv$ then $x$ is a factor (substring or infix) of $w$. A language $L$ is factor-free if, for all distinct words $x, y \in \varSigma^$, $x \in L$ and $y \in L$ imply that $x$ and $y$ are not factors of each other. Given a regular language $L$, we define $basis(L)$ as $L' \subseteq L$ such that $L'$ is factor-free and for every word $x \in L$ there exists a word $y \in L'$ such that $y$ is a factor of $x$.
Question: Is $basis(L)$ efficiently computable, given (say) a DFA for $L$? If so, do you have any ideas about the algorithm that computes it?
Example: Consider the language $L=\{ab,abab,ababab,abababab,\ldots\}=\{(ab)^i|i>0\}$. Then, $basis(L)=\{ab\}$.
Let me give you my insights on the subject, it may help.
It has been proven that checking factor-freeness is tractable for regular languages1.
Also, in a trivial way, $L=basis(L)$ if $L$ is factor-free. This leads to the fact that $basis(L)$ is not always finite. Because for an infinite factor-free language the basis is also infinite. Here is an example from the Handbook of Formal Language Theory (vol. 1, p. 19): $L=\{ba^ib | i > 0\}$.
For finite languages, it seems that a basis always exists because a language $L$ is either factor-free thus $basis(L)=L$ or not factor-free. The latter necessarily means that $L$ contains a sublanguage $L'$ for which some words are factors of each others, consequently $basis(L)=basis(L') \cup (L \setminus L')$.
1 Han YS, Wang, YJ , Wood, D. Infix-free regular expressions and languages. International journal of foundations of computer science. 2006.
Background: Given an alphabet $\varSigma$. If $u,v,w,x \in \varSigma^$ and $w=uxv$ then $x$ is a factor (substring or infix) of $w$. A language $L$ is factor-free if, for all distinct words $x, y \in \varSigma^$, $x \in L$ and $y \in L$ imply that $x$ and $y$ are not factors of each other. Given a regular language $L$, we define $basis(L)$ as $L' \subseteq L$ such that $L'$ is factor-free and for every word $x \in L$ there exists a word $y \in L'$ such that $y$ is a factor of $x$.
Question: Is $basis(L)$ efficiently computable, given (say) a DFA for $L$? If so, do you have any ideas about the algorithm that computes it?
Example: Consider the language $L=\{ab,abab,ababab,abababab,\ldots\}=\{(ab)^i|i>0\}$. Then, $basis(L)=\{ab\}$.
Let me give you my insights on the subject, it may help.
It has been proven that checking factor-freeness is tractable for regular languages1.
Also, in a trivial way, $L=basis(L)$ if $L$ is factor-free. This leads to the fact that $basis(L)$ is not always finite. Because for an infinite factor-free language the basis is also infinite. Here is an example from the Handbook of Formal Language Theory (vol. 1, p. 19): $L=\{ba^ib | i > 0\}$.
For finite languages, it seems that a basis always exists because a language $L$ is either factor-free thus $basis(L)=L$ or not factor-free. The latter necessarily means that $L$ contains a sublanguage $L'$ for which some words are factors of each others, consequently $basis(L)=basis(L') \cup (L \setminus L')$.
1 Han YS, Wang, YJ , Wood, D. Infix-free regular expressions and languages. International journal of foundations of computer science. 2006.
Solution
A language $L$ is infix-free if
$$ L \cap (\Sigma^+L\Sigma^ \cup \Sigma^L\Sigma^+) = \emptyset. $$
(This deep observation, which I made independently, appears in reference [1] to the paper you cite.) This gives a simple algorithm for deciding whether a regular language $L$ (given as a DFA, NFA, or regular expression) is infix-free.
Similarly, the basis of $L$ is
$$ L \setminus (\Sigma^+L\Sigma^ \cup \Sigma^L\Sigma^+). $$
This gives a simple algorithm for computing the basis of a regular language (as a DFA, NFA, or regular expression).
$$ L \cap (\Sigma^+L\Sigma^ \cup \Sigma^L\Sigma^+) = \emptyset. $$
(This deep observation, which I made independently, appears in reference [1] to the paper you cite.) This gives a simple algorithm for deciding whether a regular language $L$ (given as a DFA, NFA, or regular expression) is infix-free.
Similarly, the basis of $L$ is
$$ L \setminus (\Sigma^+L\Sigma^ \cup \Sigma^L\Sigma^+). $$
This gives a simple algorithm for computing the basis of a regular language (as a DFA, NFA, or regular expression).
Context
StackExchange Computer Science Q#66814, answer score: 3
Revisions (0)
No revisions yet.