HiveBrain v1.2.0
Get Started
← Back to all entries
snippetpythonMinor

Mergesort vs Quicksort Timing in Python

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
pythontimingmergesortquicksort

Problem

I have implemented Mergesort and Quicksort in Python and have noticed that Quicksort runs two times as fast as Mergesort. Is this a property of the algorithm or is Mergesort coded inefficiently? Are there any other suggestions?

(I am aware that the timing includes the creation of the random list.)

Quicksort:

import time, random

def quicksort_benchmark(n=2**24):
    def quicksort(array, low, high):
        if low = pivot:
                    break

            while True:
                j -= 1
                if array[j] = j:
                return j

            temp = array[i]
            array[i] = array[j]
            array[j] = temp

    start = time.time()

    # initialize array
    array = [random.randint(0, n) for i in range(0,n)]
    quicksort(array, 0, len(array)-1)

    end = time.time()
    elapsed = end - start
    return(elapsed)


Mergesort:

def mergesort_benchmark(n=2**24):
    def mergesort(m):
        if len(m) <= 1:
            return m

        left, right = [], []
        for i in range(0, len(m)):
            if i < len(m)//2:
                left.append(m[i])
            else:
                right.append(m[i])

        left = mergesort(left)
        right = mergesort(right)
        return(merge(left,right))

    def merge(left,right):
        result = []

        i, j = 0, 0

        while i < len(left) and j < len(right):
            if left[i] <= right[j]:
                result.append(left[i])
                i += 1
            else:
                result.append(right[j])
                j += 1

        for i in range(i, len(left)):
            result.append(left[i])

        for i in range(j, len(right)):
            result.append(right[i])

        return result

    start = time.time()
    m = [random.randint(0,n) for i in range(0,n)]
    m = mergesort(m)
    end = time.time()
    elapsed = end - start
    return(elapsed)

Solution

The tl;dr answer to your questions are: Timing differences are related to your coding, and not the algorithm itself. Mostly due to how you handle arrays in-place, or methods for copying.

With that out of the question, let us review the code and in particular the code smells:

  • In general nicely written code – Let us start with the positive. Your code is mostly clean and easy to read. Good spacing, and reasonable names for variables and methods.



  • Functions within functions – Hiding the actual sorting methods within the benchmark methods, removes the potential for using your methods outside of the actual benchmark methods. Not wise. I could understand hiding the partition() method as that is intrinsic to the quicksort() implementation, and similar for the merge() to be included within mergesort(). As that would still expose the main sorting functions.



  • In-place vs new arrays – Your quicksort() changes the array in-place, and doesn't use extra storage, whilst the mergesort() creates a bunch of extra arrays whilst it is dividing and conquering. This gives the quicksort() and unfair advantage, and renders the quicksort() with a nasty sideeffect of changing the array. This could be a wanted effect in some cases, but mostly you do not want to change the array passed into the function.



-
Try to avoid While True: – In general, a lot of these can be avoided by flipping and turning the statements a little. Especially when used to loop to a certain point, where you use break to exit the loop, it is a lot nicer to flip it around. Instead of this:

i = low - 1
while True:
  i += 1
  if array[i] >= pivot
    break


do this:

i = low
while array[i] < pivot:
  i += 1


Note that the initial values has changed, and the condition has been flipped around. But the end result is the same, and the latter is easier to read and understand.

-
Swapping of variables – The technique you use for swapping the array elements, can be replaced by a neater statement in Python:

array[i], array[j] = array[j], array[i]


Behind the scenes, this will work exactly like your swapping, but it slightly easier on the eye, and avoids the unnecessary temp variable.

-
Avoid multiple len(m) on the same array – There is actually time to spare by not doing this operation multiple times when the array in question doesn't change.

  • range() vs xrange() – In Python 2, there is a bit of a difference between these two. The first generates the list, whilst the latter is a generator giving each number as we move on. This has a memory and performance impact when the range is getting large, like for 2**24...



-
Looping vs slicing – Python has some really nice slice operators which means that stuff like:

for i in range(j, len(left)):
  result.append(left[i])


can be replaced with:

result.extend(left[j:len(left)]


Note that I've also switched to extend, and not append, to proper handle the case if the slice is empty, and not actually add the slice to result.

Speed implications

When trying to compare the various implementations I had to make some slight changes to original code, so here are the simple changes with the new names of the main sorting method:

  • quicksort – Your original code but with a duplication of the array to be sorted, to even out the playing field, and giving all sorters the same array to sort.



  • hquicksort – My implementation flipping the while True statements around, and implementing the slightly simpler swap method. I also removed references to i and j, with low and right. Code included at end.



  • sort – An quicksort implementation as implemented in Quicksort with Python. A rather fast recursive, new array, no indices implementation.



  • SOquicksort – The in-place solution of the same thread (linked above).



  • mergesort – Your implementation of mergesort.



  • hmergesort – Your implementation with replacing of the len(m) with precalculated values, and using xrange() instead of range(). Code not included.



  • imergesort – Taking it even one step further, using slicing instead of the loop appending of the results. Code included at end.



  • sorted – Using the Python default sorting methods of sorted().



The timing of these using timeit for n=2**14 with 10 testruns gave the following (when run using repl.it (Do note that the code given there are messy and more of a sandbox rather than a good example of coding :-)) result:

quicksort : 0.645922899246
hquicksort : 0.563888788223
sort : 0.527070045471
SOquicksort: 0.455115795135

mergesort : 1.35985612869
hmergesort : 0.970368146896
imergesort : 0.714356899261
sorted : 0.0575048923492


As can be seen the quicksort was improved quite a bit with even relatively small changes in your code, but better implementations does exist. For the mergesort I didn't look for alternative implementations, but the changes I made it ran in half the time, resulting in a faster time compared to your

Code Snippets

i = low - 1
while True:
  i += 1
  if array[i] >= pivot
    break
i = low
while array[i] < pivot:
  i += 1
array[i], array[j] = array[j], array[i]
for i in range(j, len(left)):
  result.append(left[i])
result.extend(left[j:len(left)]

Context

StackExchange Code Review Q#162121, answer score: 6

Revisions (0)

No revisions yet.