HiveBrain v1.2.0
Get Started
← Back to all entries
snippetpythonMinor

Classic merge sort

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
sortclassicmerge

Problem

Here is my code for classic merge sort. My question is, each time I create a new list result to merge sub-parts, is there a way to save the space -- not creating a new list result?

Besides my above major question, any advice for code improvement, bugs, performance (algorithm time complexity perspective) is highly appreciated.

def merge_sort(numbers, start, end):
    if not numbers:
        return None
    if start == end:
        return [numbers[start]]
    mid = (start + end) // 2
    sub_part1 = merge_sort(numbers, start, mid)
    sub_part2 = merge_sort(numbers, mid+1, end)
    result = []
    i = 0
    j = 0
    while i < len(sub_part1) and j < len(sub_part2):
        if sub_part1[i] <= sub_part2[j]:
            result.append(sub_part1[i])
            i += 1
        else:
            result.append(sub_part2[j])
            j += 1

    if i < len(sub_part1) and j == len(sub_part2):
        result.extend(sub_part1[i:])
    if j < len(sub_part2) and i == len(sub_part1):
        result.extend(sub_part2[j:])

    return result

if __name__ == "__main__":
    print merge_sort([1,4,2,5,6,8,3,4,0], 0, 8)

Solution

Step 1:

I would change the computation of mid to:

mid = start + (start - end) // 2


Although python won't give you the wrong answer if you use yours, you theoretically lose performance with lots of elements (>263).

Also, the code when you are doing the final part of the merge can be simplified to:

if i < len(sub_part1):
    result.extend(sub_part1[i:])
else:
    result.extend(sub_part2[j:])


Which is just cleaner. This can be further improved to:

result.extend(sub_part1[i:])
result.extend(sub_part2[j:])


because the cost of the if statement is more than the cost of appending nothing.
Step 2:

One of the tricks most implementations of mergesort use is they call a slightly different version of themselves at the beginning who's signature is:

def merge_sort(numbers, start, end, result):
    ...


The advantage of this is that you can alternate which list is your numbers and which list stores the result to avoid allocation or appending to arrays. (because you merge onto an existing result array).

Finally, if you want a fast implementation that feels like cheating (why I love Python), you can use this version from wikibooks.

from heapq import merge

def mergesort(w):
    if len(w)<2:
        return w
    else:    
        mid=len(w)//2
        return merge(mergesort(w[:mid]), mergesort(w[mid:]))

Code Snippets

mid = start + (start - end) // 2
if i < len(sub_part1):
    result.extend(sub_part1[i:])
else:
    result.extend(sub_part2[j:])
result.extend(sub_part1[i:])
result.extend(sub_part2[j:])
def merge_sort(numbers, start, end, result):
    ...
from heapq import merge

def mergesort(w):
    if len(w)<2:
        return w
    else:    
        mid=len(w)//2
        return merge(mergesort(w[:mid]), mergesort(w[mid:]))

Context

StackExchange Code Review Q#149597, answer score: 4

Revisions (0)

No revisions yet.