patternMinor
Sorting array of strings (with repetitions) according to a given ordering
Viewed 0 times
sortingarraywithorderingrepetitionsaccordingstringsgiven
Problem
We get two arrays:
and
We want to find the array
The strings (with possible repetitions) should be sorted as in the
The $n^{2}$ solution is obvious, can we do better? The memory doesn't matter and it doesn't have to be an in-place algorithm.
ordering = ["one", "two", "three"]and
input = ["zero", "one", "two", "two", "three", "three", "three", "four"];We want to find the array
output so thatoutput = ["one", "two", "two", "three", "three", "three", "zero", "four"]
// or
output = ["one", "two", "two", "three", "three", "three", "four", "zero"]The strings (with possible repetitions) should be sorted as in the
ordering array. Not found/contained strings should be put at the end of the new array and their order doesn't matter.The $n^{2}$ solution is obvious, can we do better? The memory doesn't matter and it doesn't have to be an in-place algorithm.
Solution
The other answers provide correct solutions. However, you can do a bit better considering multivariate complexity. Note that the provided running times in the other answers are not quite specific since they ignore either the size of the first array or the lengths of the strings.
Here are 2 different methods with their specific running times. The first of which is a brief description of the method in the other answers with specific analysis of the running time. The second of which has better running time and is almost optimal.
Let us call the first (ordered) list in the input $A$ and the second (to be ordered) $S$. Let $n$ be the size of $A$ and $m$ the size of $S$. Let $l$ be the maximum length of a string in the input.
The first method.
This span the methods in the other answers. Note that in both we need to find for each string the whether it lays in the original ordered array and if so at what position exactly. We need either apply a linear search which will cost something like $O(ml)$ per element or we can build some kind of a dictionary data-structure such as search trees and hashing tables or a "trie" (prefix-tree) data structure to achieve each look-up operation in linear time in the size of the strings. Using this we can achieve a running time of $O((m+n)l)$ for finding the positions over all strings. Afterwards we need to sort all strings in $O(n \log n)$ if we assign the indices directly to the strings (using pairs) or $O(l n \log n)$ by applying the look-ups while searching. So the best running time we can achieve using the methods in the other answers would be $O(l (n+m) + n \log n)$.
The second method. Note that we did not make use of the original sorted array and that is why we had to sort the strings again. We can solve the problem more efficiently by using this extra informaiton. To achieve this, let us bild a dictionary $D$ (for the sake of the theoretical bounds let us define it as a trie but in you can implement it as a hash-table as well). Let us define a new list $L$ (a linked list) and a function $C$ that counts how many times each string appeared in the first array. Assume $D$ assigns to each string a distinct number from $1$ to $m$ and $C$ is an array of length $m$.
The algorithm goes as follow:
This algorithm runs in $O(l(n+m)$ which is linear (optimal) if the lengths of the strings are bounded by a constant.
Here are 2 different methods with their specific running times. The first of which is a brief description of the method in the other answers with specific analysis of the running time. The second of which has better running time and is almost optimal.
Let us call the first (ordered) list in the input $A$ and the second (to be ordered) $S$. Let $n$ be the size of $A$ and $m$ the size of $S$. Let $l$ be the maximum length of a string in the input.
The first method.
This span the methods in the other answers. Note that in both we need to find for each string the whether it lays in the original ordered array and if so at what position exactly. We need either apply a linear search which will cost something like $O(ml)$ per element or we can build some kind of a dictionary data-structure such as search trees and hashing tables or a "trie" (prefix-tree) data structure to achieve each look-up operation in linear time in the size of the strings. Using this we can achieve a running time of $O((m+n)l)$ for finding the positions over all strings. Afterwards we need to sort all strings in $O(n \log n)$ if we assign the indices directly to the strings (using pairs) or $O(l n \log n)$ by applying the look-ups while searching. So the best running time we can achieve using the methods in the other answers would be $O(l (n+m) + n \log n)$.
The second method. Note that we did not make use of the original sorted array and that is why we had to sort the strings again. We can solve the problem more efficiently by using this extra informaiton. To achieve this, let us bild a dictionary $D$ (for the sake of the theoretical bounds let us define it as a trie but in you can implement it as a hash-table as well). Let us define a new list $L$ (a linked list) and a function $C$ that counts how many times each string appeared in the first array. Assume $D$ assigns to each string a distinct number from $1$ to $m$ and $C$ is an array of length $m$.
The algorithm goes as follow:
For all i in {1, .., m}
Set C[i] := 0
// Now build D
Set index := 0
For each string s in A do
Set D[s] = index
Increase index by one
// Now find how many times each string occurs
Let L be an empty list
For each string s in S do
If s in D then
Find index = D[s]
Increase C[index] by one
Else
Append s to L
//Now output
For each string s in S do
Find index := D[s]
Find c := C[index]
while c > 0
output s
decrease c by one
For s in L
output sThis algorithm runs in $O(l(n+m)$ which is linear (optimal) if the lengths of the strings are bounded by a constant.
Code Snippets
For all i in {1, .., m}
Set C[i] := 0
// Now build D
Set index := 0
For each string s in A do
Set D[s] = index
Increase index by one
// Now find how many times each string occurs
Let L be an empty list
For each string s in S do
If s in D then
Find index = D[s]
Increase C[index] by one
Else
Append s to L
//Now output
For each string s in S do
Find index := D[s]
Find c := C[index]
while c > 0
output s
decrease c by one
For s in L
output sContext
StackExchange Computer Science Q#120986, answer score: 4
Revisions (0)
No revisions yet.