HiveBrain v1.2.0
Get Started
← Back to all entries
snippetjavaMajor

Quicksort implementation seems too slow with large data

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
withtooslowseemsquicksortlargeimplementationdata

Problem

I've written a program to compare different sorting algorithms with the array size being 10, 1,000, 100,000, 1,000,000 and 10,000,000. I, of course, expect Insertion to win out on 10 and merge heap and quick to really perform at the upper levels.

However, below are the times I'm getting. (Note that quick sort is growing much faster than heap and merge even though they are all \$\theta(n \log n)\$.)

Things I've considered:

  • Each algorithm uses the same seed to initialize each array, so the numbers are the same.



  • I am only timing the algorithm and nothing extra.



  • My professor approves of the code and doesn't know what's wrong, but maybe we're both missing something.



  • I moved the program from my flash drive to the desktop to test possible data transfer problems.



  • The algorithm (except for test 2) has only run at night with nothing else going.



Key: Hours:Minutes:Seconds:Milliseconds:Microseconds


Test 3

n Insertion Merge Heap Quick
10 0:0:0:0:3 0:0:0:0:28 0:0:0:0:35 0:0:0:0:43
1000 0:0:0:11:470 0:0:0:2:358 0:0:0:7:787 0:0:0:3:596
100000 0:0:1:911:865 0:0:0:51:506 0:0:0:24:257 0:0:0:55:519
1000000 0:3:59:769:105 0:0:0:351:129 0:0:0:238:878 0:0:0:916:885
10000000 8:11:44:552:820 0:0:3:521:108 0:0:3:560:178 0:1:13:709:830


Test 2

n Insertion Merge Heap Quick
10 0:0:0:0:5 0:0:0:0:49 0:0:0:0:37 0:0:0:0:50
1000 0:0:0:15:473 0:0:0:2:893 0:0:0:8:402 0:0:0:5:230
100000 0:0:2:518:566 0:0:0:57:845 0:0:0:32:917 0:0:0:71:243
1000000 0:5:38:538:795 0:0:0:460:796 0:0:0:312:66 0:0:1:398:508
10000000 11:48:6:630:6390:0:3:690:329 0:0:3:518:281 0:1:18:180:11


Test 1

` n Insertion Merge Heap Quick
10 2676ns 19626ns 26316ns 33454ns

Solution

This implementation of Quicksort has poor performance for arrays with many repeated elements. From Wikipedia, emphasis mine


With a partitioning algorithm such as the one described above (even
with one that chooses good pivot values), quicksort exhibits poor
performance for inputs that contain many repeated elements. The
problem is clearly apparent when all the input elements are equal: at
each recursion, the left partition is empty (no input values are less
than the pivot), and the right partition has only decreased by one
element (the pivot is removed). Consequently, the algorithm takes
quadratic time to sort an array of equal values.


To solve this quicksort equivalent of the Dutch national flag problem, an alternative linear-time partition routine can be used that separates the values into three groups: values less than the pivot, values equal to the pivot, and values greater than the pivot.

You can see this quadratic behaviour by running it on the input new int[10000], for example. In fact, you'll most likely get a StackOverflowError.

Now in your test data, you have \$10{,}000{,}000\$ elements, but you're only picking random values in the range \$[0,1{,}000)\$. So... you have lots of repeated elements!

Let's run it as-is on my computer (I didn't run insertion sort, as it's too slow)

$ java AlgComp && cat times.txt
Unsorted:   323  653  751   33  350  378  913  280  243  792
 mSorted:    33  243  280  323  350  378  653  751  792  913
 hSorted:    33  243  280  323  350  378  653  751  792  913
 qSorted:    33  243  280  323  350  378  653  751  792  913
Data is being written to times.txt...
-----------------------------------------------
Done.
n        Insertion        Merge            Heap             Quick
10       0:0:0:0:0        0:0:0:0:27       0:0:0:0:35       0:0:0:0:38
1000     0:0:0:0:0        0:0:0:1:852      0:0:0:0:822      0:0:0:3:224
100000   0:0:0:0:0        0:0:0:28:421     0:0:0:20:784     0:0:0:37:872
1000000  0:0:0:0:0        0:0:0:233:576    0:0:0:202:443    0:0:0:905:65
10000000 0:0:0:0:0        0:0:2:359:261    0:0:2:752:239    0:1:20:914:310


Now let's change this one line in data:

list[i] = random.nextInt(1000);


to this

list[i] = random.nextInt(1000000);


Now the results are more in line with our expectations:

$ java AlgComp && cat times.txt
Unsorted: 1662389001535502665295332468356126461089888942823562420254
 mSorted: 2356224683166238420254529533550266561264610898889428900153
 hSorted: 2356224683166238420254529533550266561264610898889428900153
 qSorted: 2356224683166238420254529533550266561264610898889428900153
Data is being written to times.txt...
-----------------------------------------------
Done.
n        Insertion        Merge            Heap             Quick
10       0:0:0:0:0        0:0:0:0:21       0:0:0:0:98       0:0:0:0:56
1000     0:0:0:0:0        0:0:0:1:997      0:0:0:1:14       0:0:0:2:41
100000   0:0:0:0:0        0:0:0:27:223     0:0:0:22:562     0:0:0:21:587
1000000  0:0:0:0:0        0:0:0:283:939    0:0:0:215:551    0:0:0:137:658
10000000 0:0:0:0:0        0:0:2:899:176    0:0:3:681:388    0:0:1:845:255


Of course the real fix is not to change data, but to change the partitioning algorithm.

Code Snippets

$ java AlgComp && cat times.txt
Unsorted:   323  653  751   33  350  378  913  280  243  792
 mSorted:    33  243  280  323  350  378  653  751  792  913
 hSorted:    33  243  280  323  350  378  653  751  792  913
 qSorted:    33  243  280  323  350  378  653  751  792  913
Data is being written to times.txt...
-----------------------------------------------
Done.
n        Insertion        Merge            Heap             Quick
10       0:0:0:0:0        0:0:0:0:27       0:0:0:0:35       0:0:0:0:38
1000     0:0:0:0:0        0:0:0:1:852      0:0:0:0:822      0:0:0:3:224
100000   0:0:0:0:0        0:0:0:28:421     0:0:0:20:784     0:0:0:37:872
1000000  0:0:0:0:0        0:0:0:233:576    0:0:0:202:443    0:0:0:905:65
10000000 0:0:0:0:0        0:0:2:359:261    0:0:2:752:239    0:1:20:914:310
list[i] = random.nextInt(1000);
list[i] = random.nextInt(1000000);
$ java AlgComp && cat times.txt
Unsorted: 1662389001535502665295332468356126461089888942823562420254
 mSorted: 2356224683166238420254529533550266561264610898889428900153
 hSorted: 2356224683166238420254529533550266561264610898889428900153
 qSorted: 2356224683166238420254529533550266561264610898889428900153
Data is being written to times.txt...
-----------------------------------------------
Done.
n        Insertion        Merge            Heap             Quick
10       0:0:0:0:0        0:0:0:0:21       0:0:0:0:98       0:0:0:0:56
1000     0:0:0:0:0        0:0:0:1:997      0:0:0:1:14       0:0:0:2:41
100000   0:0:0:0:0        0:0:0:27:223     0:0:0:22:562     0:0:0:21:587
1000000  0:0:0:0:0        0:0:0:283:939    0:0:0:215:551    0:0:0:137:658
10000000 0:0:0:0:0        0:0:2:899:176    0:0:3:681:388    0:0:1:845:255

Context

StackExchange Code Review Q#67387, answer score: 37

Revisions (0)

No revisions yet.