patternMinor
Algorithm Request: "Shortest non-existing substring over given alphabet"
Viewed 0 times
alphabetshortestnonsubstringrequestgivenalgorithmexistingover
Problem
I'm looking for an (efficient) algorithm to solve the following problem:
Given a string $S$ and a set of characters $M$, find the shortest string composed only of characters in $M$ that is not contained in $S$.
Try as I might, I can't seem to map this problem to any of the standard CS string problems.
Given a string $S$ and a set of characters $M$, find the shortest string composed only of characters in $M$ that is not contained in $S$.
Try as I might, I can't seem to map this problem to any of the standard CS string problems.
Solution
Here is the clean presentation (after going in circles around it).
First you consider all substrings of S that are in M*. From that you
build a trie, which may be understood as a tree structured FA that
recognizes these substrings. You build it so that you complete first the transitions of nodes that are closest to the root. As soon as you have a node from which
there is no arc for a given character in M, you have your answer which
is the string associated with that node concatenated with the missing
character. Complexity is $O(n^2)$ where $n$ is the length of the
string S, because that is the maximum number of characters you may have to consider while building the trie.
Note regarding complexity: In the trie construction you have to
consider only the longest substring in $M^*$ starting at each position
in $S$, since shorter ones are automatically taken care of. Each state
thus created is an accepting state recognizing one substring. There
are at most $n$ substrings in $M^*$, each having at most $n$ characters. Each is considered in constant time.
First you consider all substrings of S that are in M*. From that you
build a trie, which may be understood as a tree structured FA that
recognizes these substrings. You build it so that you complete first the transitions of nodes that are closest to the root. As soon as you have a node from which
there is no arc for a given character in M, you have your answer which
is the string associated with that node concatenated with the missing
character. Complexity is $O(n^2)$ where $n$ is the length of the
string S, because that is the maximum number of characters you may have to consider while building the trie.
Note regarding complexity: In the trie construction you have to
consider only the longest substring in $M^*$ starting at each position
in $S$, since shorter ones are automatically taken care of. Each state
thus created is an accepting state recognizing one substring. There
are at most $n$ substrings in $M^*$, each having at most $n$ characters. Each is considered in constant time.
Context
StackExchange Computer Science Q#21896, answer score: 6
Revisions (0)
No revisions yet.