HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Compute 'insertable' letters in a regular language

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
computeregularlanguageinsertableletters

Problem

Let $L$ a regular language and define the subsequence closure of $L$ as

$\qquad \displaystyle S(L) = \{ w \mid \exists w' \in L.\ w \text{ subsequence of } w'\}$.

The problem I want to solve is to find for such subsequences $w \in S(L)$ which letters can be inserted into them so that the result is also in $S(L)$. Formally:


Given $w_1\dots w_n \in S(L)$, output all pairs $(i,a) \in \{0,\dots,n\} \times \Sigma$ for which $w_1 \dots w_{i} a w_{i+1} \dots w_n \in S(L)$.

Consider, for instance, the language$\{ab, abc, abcc\}$. The string $b$ is in $S(L)$ and inserting $a$ at the beginning -- corresponding to $(0,a)$ -- yields $ab \in S(L)$. On the other hand, the string $cb$ is not in $S(L)$; there is no way to convert it to a language string by insertion.

Using this language, if the input string is $b$ the possible insertions I am looking for are $(0,a)$ and $(1,c)$ at the end. If the input string is $bc$ the possible insertions are $(0,a), (1,c)$ and $(2,c)$.

The use of this algorithm is in a user interface: the user builds strings belonging to the language starting from an empty string and adding one character at a time in different positions. At each step the UI prompts the user with all the possible valid letters in all the possible insertion positions.

I have a working naive algorithm that involves a lot of back-tracking, and it is way too slow even in relatively simple cases. I was wondering if there is something better, or -- failing that -- if there are any available studies of this problem.

Solution

Edit: The OP changed the question. What follows works for one interpretation of the original question.

I'm assuming you are given a DFA $A = (Q,\Sigma,\delta,q_0,F)$. The method can be adapted to handle NFAs.

Let the word be $w = w_1 \ldots w_n$. For each $i \in \{0,\ldots,n\}$, calculate the state $s_i$ that the automaton reaches after reading the prefix of length $i$ of $w$. For each $i \in \{0,\ldots,n\}$ and each accepting state $q \in F$, calculate the state $t_{i,q}$ that the reverse automaton reaches when starting at $q$ and reading the reverse of the suffix of length $i$ of $w$. Let $T_i = \{ t_{i,q} : q \in Q \}$.

For each $i \in \{0,\ldots,n\}$ and each symbol $a \in \Sigma$, check whether $\delta(s_i,a) \in T_{n-i}$. If so, $a$ can be inserted at position $i$: $w_1 \ldots w_i a w_{i+1} \ldots w_n \in L(A)$.

The running time is $O(n(|\Sigma| + |Q|))$, and the method requires space $O(|Q|)$ (if implemented by going over $i = 0,\ldots,n$, in this order).

Context

StackExchange Computer Science Q#3404, answer score: 3

Revisions (0)

No revisions yet.