snippetjavaMajor
Quicksort implementation seems too slow with large data
Viewed 0 times
withtooslowseemsquicksortlargeimplementationdata
Problem
I've written a program to compare different sorting algorithms with the array size being 10, 1,000, 100,000, 1,000,000 and 10,000,000. I, of course, expect Insertion to win out on 10 and merge heap and quick to really perform at the upper levels.
However, below are the times I'm getting. (Note that quick sort is growing much faster than heap and merge even though they are all \$\theta(n \log n)\$.)
Things I've considered:
Test 3
Test 2
Test 1
` n Insertion Merge Heap Quick
10 2676ns 19626ns 26316ns 33454ns
However, below are the times I'm getting. (Note that quick sort is growing much faster than heap and merge even though they are all \$\theta(n \log n)\$.)
Things I've considered:
- Each algorithm uses the same seed to initialize each array, so the numbers are the same.
- I am only timing the algorithm and nothing extra.
- My professor approves of the code and doesn't know what's wrong, but maybe we're both missing something.
- I moved the program from my flash drive to the desktop to test possible data transfer problems.
- The algorithm (except for test 2) has only run at night with nothing else going.
Key: Hours:Minutes:Seconds:Milliseconds:Microseconds
Test 3
n Insertion Merge Heap Quick
10 0:0:0:0:3 0:0:0:0:28 0:0:0:0:35 0:0:0:0:43
1000 0:0:0:11:470 0:0:0:2:358 0:0:0:7:787 0:0:0:3:596
100000 0:0:1:911:865 0:0:0:51:506 0:0:0:24:257 0:0:0:55:519
1000000 0:3:59:769:105 0:0:0:351:129 0:0:0:238:878 0:0:0:916:885
10000000 8:11:44:552:820 0:0:3:521:108 0:0:3:560:178 0:1:13:709:830
Test 2
n Insertion Merge Heap Quick
10 0:0:0:0:5 0:0:0:0:49 0:0:0:0:37 0:0:0:0:50
1000 0:0:0:15:473 0:0:0:2:893 0:0:0:8:402 0:0:0:5:230
100000 0:0:2:518:566 0:0:0:57:845 0:0:0:32:917 0:0:0:71:243
1000000 0:5:38:538:795 0:0:0:460:796 0:0:0:312:66 0:0:1:398:508
10000000 11:48:6:630:6390:0:3:690:329 0:0:3:518:281 0:1:18:180:11
Test 1
` n Insertion Merge Heap Quick
10 2676ns 19626ns 26316ns 33454ns
Solution
This implementation of Quicksort has poor performance for arrays with many repeated elements. From Wikipedia, emphasis mine
With a partitioning algorithm such as the one described above (even
with one that chooses good pivot values), quicksort exhibits poor
performance for inputs that contain many repeated elements. The
problem is clearly apparent when all the input elements are equal: at
each recursion, the left partition is empty (no input values are less
than the pivot), and the right partition has only decreased by one
element (the pivot is removed). Consequently, the algorithm takes
quadratic time to sort an array of equal values.
To solve this quicksort equivalent of the Dutch national flag problem, an alternative linear-time partition routine can be used that separates the values into three groups: values less than the pivot, values equal to the pivot, and values greater than the pivot.
You can see this quadratic behaviour by running it on the input
Now in your test data, you have \$10{,}000{,}000\$ elements, but you're only picking random values in the range \$[0,1{,}000)\$. So... you have lots of repeated elements!
Let's run it as-is on my computer (I didn't run insertion sort, as it's too slow)
Now let's change this one line in
to this
Now the results are more in line with our expectations:
Of course the real fix is not to change
With a partitioning algorithm such as the one described above (even
with one that chooses good pivot values), quicksort exhibits poor
performance for inputs that contain many repeated elements. The
problem is clearly apparent when all the input elements are equal: at
each recursion, the left partition is empty (no input values are less
than the pivot), and the right partition has only decreased by one
element (the pivot is removed). Consequently, the algorithm takes
quadratic time to sort an array of equal values.
To solve this quicksort equivalent of the Dutch national flag problem, an alternative linear-time partition routine can be used that separates the values into three groups: values less than the pivot, values equal to the pivot, and values greater than the pivot.
You can see this quadratic behaviour by running it on the input
new int[10000], for example. In fact, you'll most likely get a StackOverflowError.Now in your test data, you have \$10{,}000{,}000\$ elements, but you're only picking random values in the range \$[0,1{,}000)\$. So... you have lots of repeated elements!
Let's run it as-is on my computer (I didn't run insertion sort, as it's too slow)
$ java AlgComp && cat times.txt
Unsorted: 323 653 751 33 350 378 913 280 243 792
mSorted: 33 243 280 323 350 378 653 751 792 913
hSorted: 33 243 280 323 350 378 653 751 792 913
qSorted: 33 243 280 323 350 378 653 751 792 913
Data is being written to times.txt...
-----------------------------------------------
Done.
n Insertion Merge Heap Quick
10 0:0:0:0:0 0:0:0:0:27 0:0:0:0:35 0:0:0:0:38
1000 0:0:0:0:0 0:0:0:1:852 0:0:0:0:822 0:0:0:3:224
100000 0:0:0:0:0 0:0:0:28:421 0:0:0:20:784 0:0:0:37:872
1000000 0:0:0:0:0 0:0:0:233:576 0:0:0:202:443 0:0:0:905:65
10000000 0:0:0:0:0 0:0:2:359:261 0:0:2:752:239 0:1:20:914:310Now let's change this one line in
data:list[i] = random.nextInt(1000);to this
list[i] = random.nextInt(1000000);Now the results are more in line with our expectations:
$ java AlgComp && cat times.txt
Unsorted: 1662389001535502665295332468356126461089888942823562420254
mSorted: 2356224683166238420254529533550266561264610898889428900153
hSorted: 2356224683166238420254529533550266561264610898889428900153
qSorted: 2356224683166238420254529533550266561264610898889428900153
Data is being written to times.txt...
-----------------------------------------------
Done.
n Insertion Merge Heap Quick
10 0:0:0:0:0 0:0:0:0:21 0:0:0:0:98 0:0:0:0:56
1000 0:0:0:0:0 0:0:0:1:997 0:0:0:1:14 0:0:0:2:41
100000 0:0:0:0:0 0:0:0:27:223 0:0:0:22:562 0:0:0:21:587
1000000 0:0:0:0:0 0:0:0:283:939 0:0:0:215:551 0:0:0:137:658
10000000 0:0:0:0:0 0:0:2:899:176 0:0:3:681:388 0:0:1:845:255Of course the real fix is not to change
data, but to change the partitioning algorithm.Code Snippets
$ java AlgComp && cat times.txt
Unsorted: 323 653 751 33 350 378 913 280 243 792
mSorted: 33 243 280 323 350 378 653 751 792 913
hSorted: 33 243 280 323 350 378 653 751 792 913
qSorted: 33 243 280 323 350 378 653 751 792 913
Data is being written to times.txt...
-----------------------------------------------
Done.
n Insertion Merge Heap Quick
10 0:0:0:0:0 0:0:0:0:27 0:0:0:0:35 0:0:0:0:38
1000 0:0:0:0:0 0:0:0:1:852 0:0:0:0:822 0:0:0:3:224
100000 0:0:0:0:0 0:0:0:28:421 0:0:0:20:784 0:0:0:37:872
1000000 0:0:0:0:0 0:0:0:233:576 0:0:0:202:443 0:0:0:905:65
10000000 0:0:0:0:0 0:0:2:359:261 0:0:2:752:239 0:1:20:914:310list[i] = random.nextInt(1000);list[i] = random.nextInt(1000000);$ java AlgComp && cat times.txt
Unsorted: 1662389001535502665295332468356126461089888942823562420254
mSorted: 2356224683166238420254529533550266561264610898889428900153
hSorted: 2356224683166238420254529533550266561264610898889428900153
qSorted: 2356224683166238420254529533550266561264610898889428900153
Data is being written to times.txt...
-----------------------------------------------
Done.
n Insertion Merge Heap Quick
10 0:0:0:0:0 0:0:0:0:21 0:0:0:0:98 0:0:0:0:56
1000 0:0:0:0:0 0:0:0:1:997 0:0:0:1:14 0:0:0:2:41
100000 0:0:0:0:0 0:0:0:27:223 0:0:0:22:562 0:0:0:21:587
1000000 0:0:0:0:0 0:0:0:283:939 0:0:0:215:551 0:0:0:137:658
10000000 0:0:0:0:0 0:0:2:899:176 0:0:3:681:388 0:0:1:845:255Context
StackExchange Code Review Q#67387, answer score: 37
Revisions (0)
No revisions yet.