HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Is there a linear-time solution to the minimum window substring problem, provided the characters in the substring must be in order?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
problemlinearthesolutionordermustsubstringminimumprovidedtime

Problem

Suppose there are two strings, $S$ and $T$, and we want to find the length $l$ of the shortest substring of $S$ which contains all the characters in $T$, in order. (Assume the length of $T$ is bounded, so it doesn't need to be considered for determining time complexity.)

Here are some examples:

  • $S$ = "abcabc", $T$ = "abc", $l = 3$, "[abc]abc".



  • $S$ = "abcabc", $T$ = "acb", $l = 5$, "[abcab]c".



  • $S$ = "ben thinks bananas are the best", $T$ = "bans", $l=7$, "ben thinks [bananas] are the best"



I found a solution to this using recursion which I believe is $O(n^2)$.
def ordered_moving_window(S, T, include_prefix=False):
# Recursion base case
if len(T) == 0:
return 0

shortest_length = float("inf")

for i in range(len(S)):
if S[i] == T[0]:
# Recursively find all the other chars in the string
length = ordered_moving_window(S[i+1:], T[1:], include_prefix=True) + 1

# If this is being called recursively, then we need to include
# all the characters before the first character we found
# in our length calculation
if include_prefix:
length += i

shortest_length = min(length, shortest_length)

return shortest_length

print(ordered_moving_window("abcabc", "abc")) # 3
print(ordered_moving_window("abcabc", "acb")) # 5
print(ordered_moving_window("ben thinks bananas are the best", "bans")) # 7


I believe the worst case for this particular solution is something like $S$ = "aaaaaaaab", $T$ = "ab", where the solution would need to traverse down the list (which is $O(n)$) for each "a" in $S$ (of which there are $n$), leading to a total time complexity of $O(n^2)$.

A few years ago, my professor assigned me a version of this problem, and I submitted this $O(n^2)$ solution. In my solution writeup, I also mentioned that I didn't believe an $O(n)$ solution was possible. My justification was similar to:

Consider a hypothetical $O(n)$ algorit

Solution

Yes, if the length of T can be considered as a constant.

Here is an efficient algorithm.
def shortest_super_subsequence(S, T):
if len(T) == 0: return 0
shortest_length = float("inf")

indices = [-1] * len(T)
while True:
for i in range(0, len(T)):
if i == 0:
indices[0] = S.find(T[0], indices[0] + 1)
else:
indices[i] = S.find(T[i], max(indices[i - 1] + 1, indices[i]))
if indices[i] == -1:
return shortest_length
shortest_length = min(shortest_length,
indices[-1] - indices[0] + 1)


The algorithm above uses indices[i] to track the position of T[i] found in each search of a subsequence that contains T. Each time we will move to right by one position to search for T[0]. For other character T[i], we will start search at the later position of its current position and one position later than the current position of T[i-1].

Let len(S) be $n$. The algorithm above runs in time $O(n)$ if the length of T is considered as a constant. If len(T) is denoted by variable $m$, it runs in time $O(nm)$, since all updates on indices[j] need $O(n)$ time for each j. The worst case happens in situations like S="abdabdabdabdabdabdabdc" and T="abababc".

The algorithm can be speed-up by breaking the inner loop if indices[j] stays the same. Although significant, this improvement does not change the asymptotic time-complexity.

The algorithm can be thought as sliding a window (of varying size) from index indices[0] to indices[-1].

Context

StackExchange Computer Science Q#136268, answer score: 4

Revisions (0)

No revisions yet.