patternModerate
QuickSort Dijkstra 3-Way Partitioning: why the extra swapping?
Viewed 0 times
whythedijkstrawayquicksortswappingpartitioningextra
Problem
Given the algorithm above (taken from the slides (p. 35) of the Coursera course “Algorithms Part I” by Robert Sedgewick and Kevin Wayne), look at the scenario where i is at "X", the following happens:
Scenario: i -> "X", "X" > "P"
Why don't we just decrement gt until gt points to a value that is < the value at lt ("P" in this case), then we swap this value with the value at i. This will save swapping operations.
So if we do that for the scenario mentioned above, we'll do:
Is this excessive swapping needed for the algorithm? does it improve performance in some way?
If it does improve performance, how?
If it doesn't affect performance, please give a proper explanation or a proof as to why this it does not affect performance.
Also, would the second method I mentioned affect performance in any way? please explain why.
P.S. "Affect performance" as used above means either improve/degrade performance.
Scenario: i -> "X", "X" > "P"
1. swap("X", "Z"), gt--; // the value at i is now "Z", which is still > "P"
2. swap("Z", "Y"), gt--; // the value at i is now "Y", which is still > "P"
3. swap("Y", "C"), gt--; // Now we finally get a value at i "C" which is < "P"
// Now we can swap values at i and lt, and increrement them
4. swap("P", "C"), i++, lt++;Why don't we just decrement gt until gt points to a value that is < the value at lt ("P" in this case), then we swap this value with the value at i. This will save swapping operations.
So if we do that for the scenario mentioned above, we'll do:
1. gt--
2. gt--
3. swap("X", "C"), gt--;
// Now we can swap values at i and lt, and increrement them
4. swap("P", "C"), i++, lt++;Is this excessive swapping needed for the algorithm? does it improve performance in some way?
If it does improve performance, how?
If it doesn't affect performance, please give a proper explanation or a proof as to why this it does not affect performance.
Also, would the second method I mentioned affect performance in any way? please explain why.
P.S. "Affect performance" as used above means either improve/degrade performance.
Solution
In order to know that you may just decrement
For illustration, here is the relevant part of the code from the slides you took your illustration from:
Your modification would look like this:
If the element is larger, we save a swap, as you illustrated. Furthermore we won't have to compare that element again. If, however, the element is not larger, we will compare it to the pivot twice, once before the swap and once after the swap.
So with your modification we save a swap for each element that is larger than the pivot and on a position that ends up to the right of the pivot, while we introduce an additional comparison for each element in such a position that is not larger than the pivot.
Averaging over all inputs we can expect the number of elements smaller resp. larger than the pivot to be equal. So the performance will depend on the number of elements equal to the pivot (and the cost of a swap vs. a comparison). Your modification will have the better relative performance, the smaller that number is. But 3-way partitioning was developed with inputs with many equal keys in mind, so the original choice seems reasonable.
Taking a second look at the modified algorithm, we note that if we compare an element twice, there is no other comparison happening between these two. So it seems worthwhile to remember the result of the comparison. However, if we want to keep the structure of the code, this means a lot of control structure overhead:
My gut feeling is that even though it combines the advantages of the previous two versions in terms of number of comparisons and swaps, it will perform worse due to the overhead of the additional control structures.
Not only have we introduced additional operations (checking conditions, maintaining the flag), we also introduced (compared to Dijkstra's version) two additional conditional jumps (for if and while), that will often but irregularly switch between taken and not taken. The latter is a problem on modern processors, since they rely on branch predictions for optimal performance and such branches will lead to a high number of mispredictions. (The branch corresponding to the outer loop is an opposite example: Always predicting that the loop will be entered is wrong only once.)
With these considerations in mind, we can try to optimize the code. The following should perform quite well:
This version makes as many comparisons as Dijkstra's version and as many swaps as your suggested version without introducing too much overhead: For each comparison we have one branch based on
gt, you have to compare the element at position gt to the pivot.For illustration, here is the relevant part of the code from the slides you took your illustration from:
while (i 0) exch(a, i, gt--);
else i++;
}Your modification would look like this:
while (i 0)
{
while (a[gt].compareTo(v) > 0) gt--;
exch(a, i, gt--);
}
else i++;
}If the element is larger, we save a swap, as you illustrated. Furthermore we won't have to compare that element again. If, however, the element is not larger, we will compare it to the pivot twice, once before the swap and once after the swap.
So with your modification we save a swap for each element that is larger than the pivot and on a position that ends up to the right of the pivot, while we introduce an additional comparison for each element in such a position that is not larger than the pivot.
Averaging over all inputs we can expect the number of elements smaller resp. larger than the pivot to be equal. So the performance will depend on the number of elements equal to the pivot (and the cost of a swap vs. a comparison). Your modification will have the better relative performance, the smaller that number is. But 3-way partitioning was developed with inputs with many equal keys in mind, so the original choice seems reasonable.
Taking a second look at the modified algorithm, we note that if we compare an element twice, there is no other comparison happening between these two. So it seems worthwhile to remember the result of the comparison. However, if we want to keep the structure of the code, this means a lot of control structure overhead:
Boolean doComp = true;
int cmp;
while (i 0)
{
while (cmp = a[gt].compareTo(v) > 0)
{
gt--;
doComp = false;
}
exch(a, i, gt--);
}
else i++;
}My gut feeling is that even though it combines the advantages of the previous two versions in terms of number of comparisons and swaps, it will perform worse due to the overhead of the additional control structures.
Not only have we introduced additional operations (checking conditions, maintaining the flag), we also introduced (compared to Dijkstra's version) two additional conditional jumps (for if and while), that will often but irregularly switch between taken and not taken. The latter is a problem on modern processors, since they rely on branch predictions for optimal performance and such branches will lead to a high number of mispredictions. (The branch corresponding to the outer loop is an opposite example: Always predicting that the loop will be entered is wrong only once.)
With these considerations in mind, we can try to optimize the code. The following should perform quite well:
while (i 0)
{
while ((cmp = a[gt].compareTo(v)) > 0) gt--;
exch(a, i, gt--);
}
if (cmp < 0) exch(a, lt++, i++);
else i++;
}This version makes as many comparisons as Dijkstra's version and as many swaps as your suggested version without introducing too much overhead: For each comparison we have one branch based on
cmp > 0 and possibly one based on `cmp - The version you suggest removes the unnecessary swaps, but introduces additional comparisons. So it will too perform worse than the final version in this answer.
- As I pointed out, the realitve performance of the original and your version will depend on the input at hand and the relative costs of swaps and compares. (Since usually compares are faster than swaps, I'd expect your version to be faster on average.)
Code Snippets
while (i <= gt)
{
int cmp = a[i].compareTo(v);
if (cmp < 0) exch(a, lt++, i++);
else if (cmp > 0) exch(a, i, gt--);
else i++;
}while (i <= gt)
{
int cmp = a[i].compareTo(v);
if (cmp < 0) exch(a, lt++, i++);
else if (cmp > 0)
{
while (a[gt].compareTo(v) > 0) gt--;
exch(a, i, gt--);
}
else i++;
}Boolean doComp = true;
int cmp;
while (i <= gt)
{
if (doComp) cmp = a[i].compareTo(v);
doComp = true;
if (cmp < 0) exch(a, lt++, i++);
else if (cmp > 0)
{
while (cmp = a[gt].compareTo(v) > 0)
{
gt--;
doComp = false;
}
exch(a, i, gt--);
}
else i++;
}while (i <= gt)
{
int cmp = a[i].compareTo(v);
if (cmp > 0)
{
while ((cmp = a[gt].compareTo(v)) > 0) gt--;
exch(a, i, gt--);
}
if (cmp < 0) exch(a, lt++, i++);
else i++;
}Context
StackExchange Computer Science Q#22389, answer score: 13
Revisions (0)
No revisions yet.