patternMinor

Dynamic programming table for finding similar substrings is too large

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

substringsprogrammingdynamictoolargeforfindingsimilartable

Problem

Substring Diff

Given two strings of length $n$, $P = p_1\dots p_n$ and $Q = q_1 \dots q_n$, we define $M(i, j, L)$ as the number of
mismatches between $p_i \dots p_{i+L-1}$ and $q_j \dots q_{j+L-1}$. In set
notation, $M(i, j, L)$ refers to the size of the set $\{0 \leq x

Both $P$ & $Q$ would have the same length

The size of each of the string would be at the max 1500

All characters in $P$ and $Q$ are lower-case English letters.

The recursive function will have the form:

longest(string1, string2, allowed_mismatches) = 
    {
        ... (something :P )
    }

The state space then has size $K^3$. With an upper bound on $K$ of 1500, the running time and space usage will be terrible... So direct dynamic programming will not work without some additional property to reduce the state space.

Ideas?

UPDATE

Using the ideas suggested by both Yuval and Vor, I came up with the following solution that works like a charm, running in $O(K^2)$ time and using $K$ space.

```
def longest_range_min_sum(str1, str2, start1, start2, slice_size, max_sum):
longest = 0
i = 0
running_sum = 0
while i + longest max_sum:
if str1[start1 + i] != str2[start2 + i]:
running_sum -= 1
i += 1
else:
longest += 1
return longest

import sys

data = sys.stdin.readlines()
num_cases = int(data.pop(0))
for ignore in xrange(num_cases):
max_mismatches, str1, str2 = data.pop(0).split()
max_mismatches = int(max_mismatches)
m = n = len(str1)
longest = 0
for i in xrange(m + n + 1):
if i > n:
slice_size = m - (i - n)
else:
slice_size = min(i, m)
if slice_size == 0:
continue
end1 = max(m, m - i)
if i > n:
end1 = m - (i - n)
start1 = end1 - slice_size
end2 = min(i, n)
start2 = end2 - slice_size
#print zeros_and_ones
#print str1[start1:end1], ' - ', str2[

Solution

One can reduce your problem to the following. Given a sequence of $N$ numbers, find a contiguous subsequence of length at most $K+1$ having maximal sum. This problem, in turn, is solvable in time $O(N)$.

What's the connection between your problem and mine? Let the positions of the mistakes be $I_1,\ldots,I_t$, and add $I_0 = 0$, $I_{t+1} = N+1$. The sequence in question is $J_1 = I_1 - I_0,\ldots,J_{t+1}=I_{t+1}-I_t$. Every $K+1$ consecutive numbers $J_a,\ldots,J_{a+K}$ correspond to a maximal solution for your problem of length $J_a + \cdots + J_{a+K} - 1$. The entire algorithm takes linear time.

Edit: This calculates the maximal $L$ such that $M(i,i,L) \leq K$ for some $i$. The actual problem wanted to find the maximal $L$ such that $M(i,j,L) \leq K$ for some $i,j$. By considering all possible shifts, we can solve this in $O(N^2)$ time and $O(K)$ space.

Context

StackExchange Computer Science Q#4994, answer score: 5

Revisions (0)

No revisions yet.