patternMinor
Compute 'insertable' letters in a regular language
Viewed 0 times
computeregularlanguageinsertableletters
Problem
Let $L$ a regular language and define the subsequence closure of $L$ as
$\qquad \displaystyle S(L) = \{ w \mid \exists w' \in L.\ w \text{ subsequence of } w'\}$.
The problem I want to solve is to find for such subsequences $w \in S(L)$ which letters can be inserted into them so that the result is also in $S(L)$. Formally:
Given $w_1\dots w_n \in S(L)$, output all pairs $(i,a) \in \{0,\dots,n\} \times \Sigma$ for which $w_1 \dots w_{i} a w_{i+1} \dots w_n \in S(L)$.
Consider, for instance, the language$\{ab, abc, abcc\}$. The string $b$ is in $S(L)$ and inserting $a$ at the beginning -- corresponding to $(0,a)$ -- yields $ab \in S(L)$. On the other hand, the string $cb$ is not in $S(L)$; there is no way to convert it to a language string by insertion.
Using this language, if the input string is $b$ the possible insertions I am looking for are $(0,a)$ and $(1,c)$ at the end. If the input string is $bc$ the possible insertions are $(0,a), (1,c)$ and $(2,c)$.
The use of this algorithm is in a user interface: the user builds strings belonging to the language starting from an empty string and adding one character at a time in different positions. At each step the UI prompts the user with all the possible valid letters in all the possible insertion positions.
I have a working naive algorithm that involves a lot of back-tracking, and it is way too slow even in relatively simple cases. I was wondering if there is something better, or -- failing that -- if there are any available studies of this problem.
$\qquad \displaystyle S(L) = \{ w \mid \exists w' \in L.\ w \text{ subsequence of } w'\}$.
The problem I want to solve is to find for such subsequences $w \in S(L)$ which letters can be inserted into them so that the result is also in $S(L)$. Formally:
Given $w_1\dots w_n \in S(L)$, output all pairs $(i,a) \in \{0,\dots,n\} \times \Sigma$ for which $w_1 \dots w_{i} a w_{i+1} \dots w_n \in S(L)$.
Consider, for instance, the language$\{ab, abc, abcc\}$. The string $b$ is in $S(L)$ and inserting $a$ at the beginning -- corresponding to $(0,a)$ -- yields $ab \in S(L)$. On the other hand, the string $cb$ is not in $S(L)$; there is no way to convert it to a language string by insertion.
Using this language, if the input string is $b$ the possible insertions I am looking for are $(0,a)$ and $(1,c)$ at the end. If the input string is $bc$ the possible insertions are $(0,a), (1,c)$ and $(2,c)$.
The use of this algorithm is in a user interface: the user builds strings belonging to the language starting from an empty string and adding one character at a time in different positions. At each step the UI prompts the user with all the possible valid letters in all the possible insertion positions.
I have a working naive algorithm that involves a lot of back-tracking, and it is way too slow even in relatively simple cases. I was wondering if there is something better, or -- failing that -- if there are any available studies of this problem.
Solution
Edit: The OP changed the question. What follows works for one interpretation of the original question.
I'm assuming you are given a DFA $A = (Q,\Sigma,\delta,q_0,F)$. The method can be adapted to handle NFAs.
Let the word be $w = w_1 \ldots w_n$. For each $i \in \{0,\ldots,n\}$, calculate the state $s_i$ that the automaton reaches after reading the prefix of length $i$ of $w$. For each $i \in \{0,\ldots,n\}$ and each accepting state $q \in F$, calculate the state $t_{i,q}$ that the reverse automaton reaches when starting at $q$ and reading the reverse of the suffix of length $i$ of $w$. Let $T_i = \{ t_{i,q} : q \in Q \}$.
For each $i \in \{0,\ldots,n\}$ and each symbol $a \in \Sigma$, check whether $\delta(s_i,a) \in T_{n-i}$. If so, $a$ can be inserted at position $i$: $w_1 \ldots w_i a w_{i+1} \ldots w_n \in L(A)$.
The running time is $O(n(|\Sigma| + |Q|))$, and the method requires space $O(|Q|)$ (if implemented by going over $i = 0,\ldots,n$, in this order).
I'm assuming you are given a DFA $A = (Q,\Sigma,\delta,q_0,F)$. The method can be adapted to handle NFAs.
Let the word be $w = w_1 \ldots w_n$. For each $i \in \{0,\ldots,n\}$, calculate the state $s_i$ that the automaton reaches after reading the prefix of length $i$ of $w$. For each $i \in \{0,\ldots,n\}$ and each accepting state $q \in F$, calculate the state $t_{i,q}$ that the reverse automaton reaches when starting at $q$ and reading the reverse of the suffix of length $i$ of $w$. Let $T_i = \{ t_{i,q} : q \in Q \}$.
For each $i \in \{0,\ldots,n\}$ and each symbol $a \in \Sigma$, check whether $\delta(s_i,a) \in T_{n-i}$. If so, $a$ can be inserted at position $i$: $w_1 \ldots w_i a w_{i+1} \ldots w_n \in L(A)$.
The running time is $O(n(|\Sigma| + |Q|))$, and the method requires space $O(|Q|)$ (if implemented by going over $i = 0,\ldots,n$, in this order).
Context
StackExchange Computer Science Q#3404, answer score: 3
Revisions (0)
No revisions yet.