HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

For regular languages A and B, determine whether B might match early in (A B)

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
mightlanguagesregularmatchearlyfordetermineandwhether

Problem

I have two regular languages A and B, and I want to determine whether there is any pair of strings, a in A and b in B, such that (a b) is a prefix of a string in (A B) and the left-most match of B in (a b) includes one or more characters from a.

Raphael's formulation is good:


Given two regular language A, B, is there a (non-empty) prefix of a word b in B that is a suffix of a word in A so that the rest of b is a prefix of another word in B?

Example

For example, let's say I have two regular languages, one which describes some properly escaped HTML text, and one which describes an end tag:

A := ([^&<>] | [&] [a-z] [a-z0-9]+ [;])*;
B := "</title";


By inspection, I can tell that there is no string (a b) in (A B) such that the first match of B includes characters from a because ""' and the left-most match of B is '"' which includes a non-empty suffix of a : '"'.

Motivation

A in my situation describes the output of an encoder/sanitizer that is derived from a grammar, so an untrusted input is fed to the encoder/sanitizer and I know the output matches A by construction.

B is a limit condition in a larger grammar that describes how parsers determine where a chunk of an embedded language ends so they can hand it off to a parser for the embedded language.

My end goal is to be able to determine when I can optimize away runtime checks that ensure that it is safe to embed a particular encoded string. For these examples, it would be safe to optimize out the first check, but not the second.

Is this a solved problem? Does it have a name? Any pointers appreciated.

Solution

If I translate your description properly, here is what you want to ask:


Given two regular language $A$, $B$, is there a prefix of a word $b$ in $B$ that is a suffix of a word in $A$ so that the rest of $b$ is a prefix of another word in $B$?

Formally:


Given regular $A$ and $B$, are there $a = wx \in A$ and $b = yz \in B$ so that $xy \in B$?

Deciding this boils down to deciding whether

$\qquad B \cap (\operatorname{suff}(A) \cdot \operatorname{pref}(B) ) = \emptyset$.

Now we employ some closure properties: $\mathrm{REG}$ is closed against right and left quotient, concatenation and intersection, so with

$\qquad \operatorname{suff}(A) = A \backslash \Sigma^*$ and
$\operatorname{pref}(B) = B / \Sigma^*$

the set we want to check for emptiness is regular, and therefore the check is indeed decidable.

In order to actually do this, I suggest you build automata for $\operatorname{suff}(A)$ and $\operatorname{pref}(B)$, concatenate them, intersect the result with the automaton for $B$ and use the standard check for emptiness (is a final state reachable from the initial state?).

Note that you never need DFA, all of this works nicely with NFA. Therefore, no exponential state explosion occurs; intersection multiplies the state numbers, though.

Context

StackExchange Computer Science Q#10852, answer score: 4

Revisions (0)

No revisions yet.