patternMinor
$k$-way merge but without an absolute order
Viewed 0 times
withoutordermergebutwayabsolute
Problem
I have $k$ lists that I would like to combine to a single list. Each list's elements are unique and are sorted in a particular order, but there is no notion of an absolute order, and different items across lists can only be compared if they are equal. To give a specific example (here I use $2$ is that the duplicate may not be in every list to do the above (and this process cannot be done sequentially, i.e.
It should be easy to extend the above to any $k$. Use a hash to find which lists each unique element belongs to. Then traverse along the first list using the above logic - writing down all elements until you hit first duplicate, in which case write down all elements (that haven't been written down yet) in all other lists before that duplicate and keep doing that until you reach the end of first list. If haven't reached the end of second list, continue same algorithm there and so on. Each list will be traversed twice - once to get the duplicates, and second time to do the above, making it $O(n)$. A compatibility error will be discovered if you iterate along a list but can't find the duplicate (because it was already written down in an earlier pass).
l1+l2+l3 is generally not the same as (l1+l2)+l3, where + denotes this operation of finding the superset order list).It should be easy to extend the above to any $k$. Use a hash to find which lists each unique element belongs to. Then traverse along the first list using the above logic - writing down all elements until you hit first duplicate, in which case write down all elements (that haven't been written down yet) in all other lists before that duplicate and keep doing that until you reach the end of first list. If haven't reached the end of second list, continue same algorithm there and so on. Each list will be traversed twice - once to get the duplicates, and second time to do the above, making it $O(n)$. A compatibility error will be discovered if you iterate along a list but can't find the duplicate (because it was already written down in an earlier pass).
Solution
Edit: You recently revised the question to clarify that you are allowed to hash items. That changes the question: it lets us build much more efficient algorithms.
With this change, your question becomes: given $k$ partial orders, find a total order that is consistent with all of the $k$ partial orders. This problem can be solved in linear time, i.e., in $O(n)$ time, where $n$ is the total number of items across all of the sublists.
Here's how to solve it. We'll build a directed graph, where each item (in any sublist) is a vertex is the graph. Traverse the sublists to extract all consecutive pairs of items that appear in any sublist. If you have
Once you've built the entire graph, topologically sort it, and output the vertices in topologically sorted order. That will be a valid output list that solves your problem. If the topological sort fails and reports that the graph has a cycle, then your sublists are inconsistent and you should output an error.
This solves your problem in linear time, i.e., $O(n)$ time.
My original answer to your original question:
You don't list any requirements on what order the combined list elements must appear in. Therefore, concatenating/interleaving all of the list items in an arbitrary order appears to meet all of the explicitly stated requirements. If that's not an acceptable solution, you'll need to think through your requirements more carefully and then edit your question to state the requirements explicitly.
I suspect you meant to require that the elements in the combined list must appear in an order that is consistent with the order in each sublist, and any item should appear only once in the combined list. If that's what you meant, then you can't do better than $O(n^2)$ time, where $n$ is the total number of items across all of the sublists: there's no way to avoid comparing each item of each sublist with each other. (If there is any pair of items which you have not compared explicitly, you have no way of knowing whether they are equal or not: both are possible.)
It's easy to achieve an $O(n^2)$-time algorithm. You simply compare each item to all other items, to first find all duplicated items (items that appear in more than one sublist). Then, you do a straightforward $k$-way merge. At each step of the $k$-way merge, you look at the head of each of the $k$ sublists and classify each of those $k$ items as duplicated or not. If any of them are non-duplicated, you can remove it from the head of its sublist and append it to the combined list. If one of them is duplicated, and all of its duplicates appear among these $k$ items, then you remove each of them from their corresponding sublists and append it to the combined list. Repeat this until done. If no items appears twice in any sublist, and if all of the sublists are sorted in a compatible way (e.g., you don't have
The running time of this algorithm is $O(n^2)$, and it's not possible to do better than that (assuming the only operation available on items is to test two items for equality), so this algorithm is optimal up to a constant factor.
With this change, your question becomes: given $k$ partial orders, find a total order that is consistent with all of the $k$ partial orders. This problem can be solved in linear time, i.e., in $O(n)$ time, where $n$ is the total number of items across all of the sublists.
Here's how to solve it. We'll build a directed graph, where each item (in any sublist) is a vertex is the graph. Traverse the sublists to extract all consecutive pairs of items that appear in any sublist. If you have
a < b in a sublist (i.e., item a immediately precedes b in some sublist), add an edge $a \to b$ to the graph. (Notice that you can have a hashtable that maps the item a to the corresponding vertex $a$ in the graph, so this operation involves two lookups in the hashtable and then adding an edge: $O(1)$ time.)Once you've built the entire graph, topologically sort it, and output the vertices in topologically sorted order. That will be a valid output list that solves your problem. If the topological sort fails and reports that the graph has a cycle, then your sublists are inconsistent and you should output an error.
This solves your problem in linear time, i.e., $O(n)$ time.
My original answer to your original question:
You don't list any requirements on what order the combined list elements must appear in. Therefore, concatenating/interleaving all of the list items in an arbitrary order appears to meet all of the explicitly stated requirements. If that's not an acceptable solution, you'll need to think through your requirements more carefully and then edit your question to state the requirements explicitly.
I suspect you meant to require that the elements in the combined list must appear in an order that is consistent with the order in each sublist, and any item should appear only once in the combined list. If that's what you meant, then you can't do better than $O(n^2)$ time, where $n$ is the total number of items across all of the sublists: there's no way to avoid comparing each item of each sublist with each other. (If there is any pair of items which you have not compared explicitly, you have no way of knowing whether they are equal or not: both are possible.)
It's easy to achieve an $O(n^2)$-time algorithm. You simply compare each item to all other items, to first find all duplicated items (items that appear in more than one sublist). Then, you do a straightforward $k$-way merge. At each step of the $k$-way merge, you look at the head of each of the $k$ sublists and classify each of those $k$ items as duplicated or not. If any of them are non-duplicated, you can remove it from the head of its sublist and append it to the combined list. If one of them is duplicated, and all of its duplicates appear among these $k$ items, then you remove each of them from their corresponding sublists and append it to the combined list. Repeat this until done. If no items appears twice in any sublist, and if all of the sublists are sorted in a compatible way (e.g., you don't have
a before b in one sublist but b before a in another sublist), then this process will never get stuck and will yield the desired output.The running time of this algorithm is $O(n^2)$, and it's not possible to do better than that (assuming the only operation available on items is to test two items for equality), so this algorithm is optimal up to a constant factor.
Context
StackExchange Computer Science Q#16411, answer score: 5
Revisions (0)
No revisions yet.