HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Block matching pattern matching problem

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
patternblockmatchingproblem

Problem

I've struggling with this problem for a while.

Given text $T=t_1t_2\ldots t_n$ and pattern $P=p_1p_2\ldots p_m$ over alphabet $\Sigma=\mathbb{N}$, we say there's a block matching of $P$ in index $i$ if there exist $m$ successive blocks $S_1,\ldots,S_m$ such that $$p_1=\sum_{s_1 \in S_1}s_1 \land p_2=\sum_{s_2 \in S_2}s_2 \land \cdots \land p_m=\sum_{s_m \in S_m}s_m. $$

For example:

$T=\text{1 2 3 5 1 2 3 2}$

$P=\text{6 5}$

Then we have block matching in index 1 because $1+2+3=6$ and $5=5$.

Schematically it looks like: $\text{|1 2 3||5| 1 2 3 2}$

We also have a block matching in index 4 because $5+1=6$ and $2+3=5$.

Schematically it looks like: $\text{ 1 2 3 |5 1||2 3| 2}$

The task is to find all indices where we have block matching.

I couldn't do better then a $\mathcal{O}(mn)$ but I believe there's a FFT application here.

Any hints?

Solution

If the numbers are all integers at most $k$, then you can apply a fast algorithm for substring matching with don't-care symbols (see below) to solve the problem in $O(kn(\log m + \log k))$ time by applying the following transformation:

  • Map each text character $t$ to $t-1$ zeros followed by a 1.



  • Map each pattern character $p$ to $p-1$ don't-care symbols followed by a 1, and also prepend a 1 to the overall pattern.



Basically, 1s represent "boundaries" in both strings, and we look for a way to make each boundary in the pattern line up with some boundary in the text.

This paper describes an $O(n+m+\alpha)$-time algorithm, where $\alpha$ is "the total number of occurrences of the subpatterns" (I didn't read the full paper, so I'm not sure exactly what that means). In any case it mentions the 1974 algorithm of Fischer and Paterson, "String matching and other products", which solves the problem in $O(n\log m\log \Sigma)$ time -- though that link is behind a paywall, googling should find other information on it.

Context

StackExchange Computer Science Q#79533, answer score: 2

Revisions (0)

No revisions yet.