patternpythonMinor
Merge two list and discarding duplicates
Viewed 0 times
mergediscardingtwoandlistduplicates
Problem
I am trying to implement a function that merges two ordered lists into a third (ordered) one, but duplicates have to be discarded (basically the last step of a mergesort).
I think that this code can be generalized to an arbitrary number or lists, by keeping a list of indices, rather than 2 explicit ones.
Possible improvements: factor out the repetitive parts, for example by using something similar to this helper function:
Unfortunately this won't work (
I think that this code can be generalized to an arbitrary number or lists, by keeping a list of indices, rather than 2 explicit ones.
def merge_no_duplicates(list_1, list_2):
i = j = 0
import collections
result = collections.deque()
while i 1 and list_1[i] == result[-1]:
i += 1
continue
if len(result) > 1 and list_2[j] == result[-1]:
j += 1
continue
if list_1[i] <= list_2[j]:
result.append(list_1[i])
i += 1
elif list_2[j] < list_1[i]:
result.append(list_2[j])
j += 1
# we still need to consume part of list_2
if i == len(list_1):
while j < len(list_2):
if list_2[j] == result[-1]:
j += 1
continue
result.append(list_2[j])
j += 1
# still need to consume part of list_1
if j == len(list_2):
while i < len(list_1):
if list_1[i] == result[-1]:
i += 1
continue
result.append(list_1[i])
i += 1
return resultPossible improvements: factor out the repetitive parts, for example by using something similar to this helper function:
check_duplicates(my_list, index):
if my_list[index] == result[-1]:
index += 1Unfortunately this won't work (
index is a parameter, and will not influence the behavior of i or j). Is this possible at all?Solution
Pick a good return type
A deque is a Double Ended QUEue. Deques are great for when you have to insert and erase from the front as well as the back. You never need this for your problem - all you ever need is to insert at the end. You just need a normal list.
Generate when you can
Rather than necessarily giving the entire result all at once, it's better to just
Use the standard when you can
There is a function that already merges sorted inputs into a single sorted output:
If you don't want to use
A deque is a Double Ended QUEue. Deques are great for when you have to insert and erase from the front as well as the back. You never need this for your problem - all you ever need is to insert at the end. You just need a normal list.
Generate when you can
Rather than necessarily giving the entire result all at once, it's better to just
yield the next element as you go. This is a simple change in your algorithm (just yield x instead of result.append(x)), but could have serious performance implications down the line if you have lots of large iterables. If the caller wants a full list, then can always explicitly write list(merge_no_duplicates(a, b, c)). Use the standard when you can
There is a function that already merges sorted inputs into a single sorted output:
heapq.merge. It will give you duplicates, but that seems like a much better starting point than writing everything from scratch:def merge_no_duplicates(*iterables):
last = object()
for val in heapq.merge(*iterables):
if val != last:
last = val
yield valIf you don't want to use
heapq.merge, then you can at least use this framework to separate the "merge sorted iterables" concern from the "remove duplicates" concern.Code Snippets
def merge_no_duplicates(*iterables):
last = object()
for val in heapq.merge(*iterables):
if val != last:
last = val
yield valContext
StackExchange Code Review Q#108171, answer score: 5
Revisions (0)
No revisions yet.