patternMinor
Refine quickselect with Median of Medians to deal with duplicates
Viewed 0 times
refinemedianswithquickselectduplicatesmediandeal
Problem
In slide 17 of this lecture note of a course at UC Berkeley, it is said concerning quickselect with median of medians:
If there are repeated elements
We can reduce to the case of no repetitions by considering the
median of the array $a_0'$, $a_1'$, $\cdots$, $a_n'$ where $a_i' = (n + 1)a_i + i$.
The order is preserved and there are no repetitions.
Alternatively, one has to refine the algorithm and the analysis
(see CLR).
But in my copy of CLRS (2011), the authors assumed that the elements are distinct.
How would one refine the algorithm and the analysis? I'm aware that you can make elements distinct in $O(n)$, but I'm interested in the alternative.
If there are repeated elements
We can reduce to the case of no repetitions by considering the
median of the array $a_0'$, $a_1'$, $\cdots$, $a_n'$ where $a_i' = (n + 1)a_i + i$.
The order is preserved and there are no repetitions.
Alternatively, one has to refine the algorithm and the analysis
(see CLR).
But in my copy of CLRS (2011), the authors assumed that the elements are distinct.
How would one refine the algorithm and the analysis? I'm aware that you can make elements distinct in $O(n)$, but I'm interested in the alternative.
Solution
A common refinement to quicksort could also be used with quickselect to deal with duplicates. Instead of partitioning into two buckets, do so into three. This new partition is called the "fat partition". It holds items that are equal to the current pivot. When there are no duplicates, this adds 2% overhead to the sort, according to the following paper.
See https://cs.fit.edu/~pkc/classes/writing/papers/bentley93engineering.pdf
The paper discusses several memory layouts for partition, like whether to put the fat partition in the middle or the end of the invariant array or even split it and put part at the head and part at the tail, all to reduce the number of memory swaps.
The paper quantifies the huge benefits for some extreme edge cases, like when the data is a random set of ones and zeroes. In that case, their algorithm went from being four times slower than a benchmark Unix qsort to being twice as fast, an 8x speedup by the addition of a fat partition.
See https://cs.fit.edu/~pkc/classes/writing/papers/bentley93engineering.pdf
The paper discusses several memory layouts for partition, like whether to put the fat partition in the middle or the end of the invariant array or even split it and put part at the head and part at the tail, all to reduce the number of memory swaps.
The paper quantifies the huge benefits for some extreme edge cases, like when the data is a random set of ones and zeroes. In that case, their algorithm went from being four times slower than a benchmark Unix qsort to being twice as fast, an 8x speedup by the addition of a fat partition.
Context
StackExchange Computer Science Q#110201, answer score: 2
Revisions (0)
No revisions yet.