HiveBrain v1.2.0
Get Started
← Back to all entries
patternModerate

Count number of non-contiguous occurrences in string

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
numberoccurrencesnoncontiguouscountstring

Problem

Given strings $S,T$ such that $n=|T|>|S|$ , I'd like an algorithm to count number of occurrences of $S$ in $T$ (as a subsequence), not necessarily contiguous.

Example:
if $T=aababc, S=abc$, the algorithm should return $5$.

I've tried taking a suffix tree/ suffix array approach, only to realize that these tools are good for contiguous substrings but not for non-contiguous subsequences.

aababc , aababc , aababc , aababc , aababc

A problem that (might) kind of relate to this problem from what I know would be the maximal non-contiguous subarray problem, yet I do not know how to connect between these.

Solution

A dynamic programming algorithm in $\mathcal{O}(|S| |T|)$ should do the trick.

Let's denote $S = s_1…s_m$ and $T = t_1…t_n$. For $0\leqslant i \leqslant m$, $0\leqslant j \leqslant n$, let $N(i, j)$ be the number of occurrences of $s_1…s_i$ in $t_1…t_j$. We have $N(0, j) = 1$ (by convention, for computation purposes), $N(i, j) = 0$ if $i > j$ and if $i, j>0$:

$$N(i, j) = \left\{\begin{array}{ll}
N(i, j-1) & \text{if }s_i \neq t_j\\
N(i, j-1) + N(i-1,j-1) & \text{otherwise}
\end{array}\right.$$
The idea is that a subsequence $s_1…s_i$ appears in $t_1…t_j$ if it appears in $t_1…t_{j-1}$ or if $s_1…s_{i-1}$ is a subsequence of $t_1…t_{j-1}$ and $s_i = t_j$.

Now you just have to compute $N(m, n)$ to get the answer you want.

Context

StackExchange Computer Science Q#155611, answer score: 11

Revisions (0)

No revisions yet.