patternMinor

On computing the factor-free basis of a regular language

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

thebasisfreeregularlanguagefactorcomputing

Problem

The problem is as follows.

Background: Given an alphabet $\varSigma$. If $u,v,w,x \in \varSigma^$ and $w=uxv$ then $x$ is a factor (substring or infix) of $w$. A language $L$ is factor-free if, for all distinct words $x, y \in \varSigma^$, $x \in L$ and $y \in L$ imply that $x$ and $y$ are not factors of each other. Given a regular language $L$, we define $basis(L)$ as $L' \subseteq L$ such that $L'$ is factor-free and for every word $x \in L$ there exists a word $y \in L'$ such that $y$ is a factor of $x$.

Question: Is $basis(L)$ efficiently computable, given (say) a DFA for $L$? If so, do you have any ideas about the algorithm that computes it?

Example: Consider the language $L=\{ab,abab,ababab,abababab,\ldots\}=\{(ab)^i|i>0\}$. Then, $basis(L)=\{ab\}$.

Let me give you my insights on the subject, it may help.

It has been proven that checking factor-freeness is tractable for regular languages1.

Also, in a trivial way, $L=basis(L)$ if $L$ is factor-free. This leads to the fact that $basis(L)$ is not always finite. Because for an infinite factor-free language the basis is also infinite. Here is an example from the Handbook of Formal Language Theory (vol. 1, p. 19): $L=\{ba^ib | i > 0\}$.

For finite languages, it seems that a basis always exists because a language $L$ is either factor-free thus $basis(L)=L$ or not factor-free. The latter necessarily means that $L$ contains a sublanguage $L'$ for which some words are factors of each others, consequently $basis(L)=basis(L') \cup (L \setminus L')$.

1 Han YS, Wang, YJ , Wood, D. Infix-free regular expressions and languages. International journal of foundations of computer science. 2006.

Solution

A language $L$ is infix-free if
$$ L \cap (\Sigma^+L\Sigma^ \cup \Sigma^L\Sigma^+) = \emptyset. $$
(This deep observation, which I made independently, appears in reference [1] to the paper you cite.) This gives a simple algorithm for deciding whether a regular language $L$ (given as a DFA, NFA, or regular expression) is infix-free.

Similarly, the basis of $L$ is
$$ L \setminus (\Sigma^+L\Sigma^ \cup \Sigma^L\Sigma^+). $$
This gives a simple algorithm for computing the basis of a regular language (as a DFA, NFA, or regular expression).

Context

StackExchange Computer Science Q#66814, answer score: 3

Revisions (0)

No revisions yet.