patternMinor
Is there a linear-time solution to the minimum window substring problem, provided the characters in the substring must be in order?
Viewed 0 times
problemlinearthesolutionordermustsubstringminimumprovidedtime
Problem
Suppose there are two strings, $S$ and $T$, and we want to find the length $l$ of the shortest substring of $S$ which contains all the characters in $T$, in order. (Assume the length of $T$ is bounded, so it doesn't need to be considered for determining time complexity.)
Here are some examples:
I found a solution to this using recursion which I believe is $O(n^2)$.
I believe the worst case for this particular solution is something like $S$ = "aaaaaaaab", $T$ = "ab", where the solution would need to traverse down the list (which is $O(n)$) for each "a" in $S$ (of which there are $n$), leading to a total time complexity of $O(n^2)$.
A few years ago, my professor assigned me a version of this problem, and I submitted this $O(n^2)$ solution. In my solution writeup, I also mentioned that I didn't believe an $O(n)$ solution was possible. My justification was similar to:
Consider a hypothetical $O(n)$ algorit
Here are some examples:
- $S$ = "abcabc", $T$ = "abc", $l = 3$, "[abc]abc".
- $S$ = "abcabc", $T$ = "acb", $l = 5$, "[abcab]c".
- $S$ = "ben thinks bananas are the best", $T$ = "bans", $l=7$, "ben thinks [bananas] are the best"
I found a solution to this using recursion which I believe is $O(n^2)$.
def ordered_moving_window(S, T, include_prefix=False):
# Recursion base case
if len(T) == 0:
return 0
shortest_length = float("inf")
for i in range(len(S)):
if S[i] == T[0]:
# Recursively find all the other chars in the string
length = ordered_moving_window(S[i+1:], T[1:], include_prefix=True) + 1
# If this is being called recursively, then we need to include
# all the characters before the first character we found
# in our length calculation
if include_prefix:
length += i
shortest_length = min(length, shortest_length)
return shortest_length
print(ordered_moving_window("abcabc", "abc")) # 3
print(ordered_moving_window("abcabc", "acb")) # 5
print(ordered_moving_window("ben thinks bananas are the best", "bans")) # 7
I believe the worst case for this particular solution is something like $S$ = "aaaaaaaab", $T$ = "ab", where the solution would need to traverse down the list (which is $O(n)$) for each "a" in $S$ (of which there are $n$), leading to a total time complexity of $O(n^2)$.
A few years ago, my professor assigned me a version of this problem, and I submitted this $O(n^2)$ solution. In my solution writeup, I also mentioned that I didn't believe an $O(n)$ solution was possible. My justification was similar to:
Consider a hypothetical $O(n)$ algorit
Solution
Yes, if the length of
Here is an efficient algorithm.
The algorithm above uses
Let
The algorithm can be speed-up by breaking the inner loop if
The algorithm can be thought as sliding a window (of varying size) from index
T can be considered as a constant.Here is an efficient algorithm.
def shortest_super_subsequence(S, T):
if len(T) == 0: return 0
shortest_length = float("inf")
indices = [-1] * len(T)
while True:
for i in range(0, len(T)):
if i == 0:
indices[0] = S.find(T[0], indices[0] + 1)
else:
indices[i] = S.find(T[i], max(indices[i - 1] + 1, indices[i]))
if indices[i] == -1:
return shortest_length
shortest_length = min(shortest_length,
indices[-1] - indices[0] + 1)
The algorithm above uses
indices[i] to track the position of T[i] found in each search of a subsequence that contains T. Each time we will move to right by one position to search for T[0]. For other character T[i], we will start search at the later position of its current position and one position later than the current position of T[i-1].Let
len(S) be $n$. The algorithm above runs in time $O(n)$ if the length of T is considered as a constant. If len(T) is denoted by variable $m$, it runs in time $O(nm)$, since all updates on indices[j] need $O(n)$ time for each j. The worst case happens in situations like S="abdabdabdabdabdabdabdc" and T="abababc".The algorithm can be speed-up by breaking the inner loop if
indices[j] stays the same. Although significant, this improvement does not change the asymptotic time-complexity.The algorithm can be thought as sliding a window (of varying size) from index
indices[0] to indices[-1].Context
StackExchange Computer Science Q#136268, answer score: 4
Revisions (0)
No revisions yet.