patternMinor
For regular languages A and B, determine whether B might match early in (A B)
Viewed 0 times
mightlanguagesregularmatchearlyfordetermineandwhether
Problem
I have two regular languages A and B, and I want to determine whether there is any pair of strings, a in A and b in B, such that (a b) is a prefix of a string in (A B) and the left-most match of B in (a b) includes one or more characters from a.
Raphael's formulation is good:
Given two regular language A, B, is there a (non-empty) prefix of a word b in B that is a suffix of a word in A so that the rest of b is a prefix of another word in B?
Example
For example, let's say I have two regular languages, one which describes some properly escaped HTML text, and one which describes an end tag:
By inspection, I can tell that there is no string (a b) in (A B) such that the first match of B includes characters from a because
Motivation
A in my situation describes the output of an encoder/sanitizer that is derived from a grammar, so an untrusted input is fed to the encoder/sanitizer and I know the output matches A by construction.
B is a limit condition in a larger grammar that describes how parsers determine where a chunk of an embedded language ends so they can hand it off to a parser for the embedded language.
My end goal is to be able to determine when I can optimize away runtime checks that ensure that it is safe to embed a particular encoded string. For these examples, it would be safe to optimize out the first check, but not the second.
Is this a solved problem? Does it have a name? Any pointers appreciated.
Raphael's formulation is good:
Given two regular language A, B, is there a (non-empty) prefix of a word b in B that is a suffix of a word in A so that the rest of b is a prefix of another word in B?
Example
For example, let's say I have two regular languages, one which describes some properly escaped HTML text, and one which describes an end tag:
A := ([^&<>] | [&] [a-z] [a-z0-9]+ [;])*;
B := "</title";By inspection, I can tell that there is no string (a b) in (A B) such that the first match of B includes characters from a because
""' and the left-most match of B is '"' which includes a non-empty suffix of a : '"'.Motivation
A in my situation describes the output of an encoder/sanitizer that is derived from a grammar, so an untrusted input is fed to the encoder/sanitizer and I know the output matches A by construction.
B is a limit condition in a larger grammar that describes how parsers determine where a chunk of an embedded language ends so they can hand it off to a parser for the embedded language.
My end goal is to be able to determine when I can optimize away runtime checks that ensure that it is safe to embed a particular encoded string. For these examples, it would be safe to optimize out the first check, but not the second.
Is this a solved problem? Does it have a name? Any pointers appreciated.
Solution
If I translate your description properly, here is what you want to ask:
Given two regular language $A$, $B$, is there a prefix of a word $b$ in $B$ that is a suffix of a word in $A$ so that the rest of $b$ is a prefix of another word in $B$?
Formally:
Given regular $A$ and $B$, are there $a = wx \in A$ and $b = yz \in B$ so that $xy \in B$?
Deciding this boils down to deciding whether
$\qquad B \cap (\operatorname{suff}(A) \cdot \operatorname{pref}(B) ) = \emptyset$.
Now we employ some closure properties: $\mathrm{REG}$ is closed against right and left quotient, concatenation and intersection, so with
$\qquad \operatorname{suff}(A) = A \backslash \Sigma^*$ and
$\operatorname{pref}(B) = B / \Sigma^*$
the set we want to check for emptiness is regular, and therefore the check is indeed decidable.
In order to actually do this, I suggest you build automata for $\operatorname{suff}(A)$ and $\operatorname{pref}(B)$, concatenate them, intersect the result with the automaton for $B$ and use the standard check for emptiness (is a final state reachable from the initial state?).
Note that you never need DFA, all of this works nicely with NFA. Therefore, no exponential state explosion occurs; intersection multiplies the state numbers, though.
Given two regular language $A$, $B$, is there a prefix of a word $b$ in $B$ that is a suffix of a word in $A$ so that the rest of $b$ is a prefix of another word in $B$?
Formally:
Given regular $A$ and $B$, are there $a = wx \in A$ and $b = yz \in B$ so that $xy \in B$?
Deciding this boils down to deciding whether
$\qquad B \cap (\operatorname{suff}(A) \cdot \operatorname{pref}(B) ) = \emptyset$.
Now we employ some closure properties: $\mathrm{REG}$ is closed against right and left quotient, concatenation and intersection, so with
$\qquad \operatorname{suff}(A) = A \backslash \Sigma^*$ and
$\operatorname{pref}(B) = B / \Sigma^*$
the set we want to check for emptiness is regular, and therefore the check is indeed decidable.
In order to actually do this, I suggest you build automata for $\operatorname{suff}(A)$ and $\operatorname{pref}(B)$, concatenate them, intersect the result with the automaton for $B$ and use the standard check for emptiness (is a final state reachable from the initial state?).
Note that you never need DFA, all of this works nicely with NFA. Therefore, no exponential state explosion occurs; intersection multiplies the state numbers, though.
Context
StackExchange Computer Science Q#10852, answer score: 4
Revisions (0)
No revisions yet.