snippetMinor
Algorithm to partially sort list into equal-sized buckets
Viewed 0 times
bucketsequalintosizedpartiallyalgorithmsortlist
Problem
Suppose I have a large list of numbers that I want to divide into equal-sized buckets so that every bucket contains only larger numbers than buckets to its left. Numbers within each bucket don't need to be sorted.
For example, for the input
that I want to divide into 5 buckets, some valid outputs would be
One approach to do this would be to sort the list, and then divide it at equal intervals. Can this be done faster?
I would also be happy with a result where the buckets are of only roughly equal size, but note that I can't assume a uniform distribution for the numbers, so bucket sort doesn't help.
For example, for the input
[64, 72, 57, 47, 5, 4, 64, 21, 64, 65, 36, 43, 81, 44, 19, 87, 17, 86,
73, 21, 19, 64, 94, 91, 34, 49, 8, 52, 18, 37]that I want to divide into 5 buckets, some valid outputs would be
[[4, 5, 8, 17, 18], [19, 19, 21, 21, 34], [36, 37, 43, 44, 47],
[49, 52, 57, 64, 64], [64, 64, 65, 72, 73]]
[[8, 17, 18, 5, 4], [21, 19, 21, 34, 19], [43, 37, 44, 36, 47],
[52, 57, 49, 64, 64], [64, 64, 72, 73, 65]]
[[8, 18, 5, 4, 17], [19, 19, 21, 21, 34], [47, 36, 44, 43, 37],
[52, 64, 49, 64, 57], [64, 72, 64, 65, 73]]
[[18, 8, 17, 5, 4], [21, 19, 21, 19, 34], [36, 44, 43, 37, 47],
[57, 64, 64, 49, 52], [64, 72, 64, 65, 73]]One approach to do this would be to sort the list, and then divide it at equal intervals. Can this be done faster?
I would also be happy with a result where the buckets are of only roughly equal size, but note that I can't assume a uniform distribution for the numbers, so bucket sort doesn't help.
Solution
Assuming you want to create $k$ buckets, to the following.
Both steps take time $\Theta(kn)$. All buckets are within $\pm 1$ of the same size.
In the case of duplicate elements, you can "uniquify" them in a $\Theta(n)$ preprocessing run; otherwise bucket sizes might differ more.
Essentially, this is performing only the top-level call of a $k$-way Quicksort with perfect pivots. All variants regarding pivot selection and partitioning apply and will lead to different runtime and bucket size characteristics.
- Obtain elements of rank $\lceil n/k \rceil, 2 \cdot \lceil n/k \rceil \dots, (k-1) \cdot \lceil n/k \rceil$.
- Perform multi-way partitioning (cf. Quicksort) with these elements as pivot.
Both steps take time $\Theta(kn)$. All buckets are within $\pm 1$ of the same size.
In the case of duplicate elements, you can "uniquify" them in a $\Theta(n)$ preprocessing run; otherwise bucket sizes might differ more.
Essentially, this is performing only the top-level call of a $k$-way Quicksort with perfect pivots. All variants regarding pivot selection and partitioning apply and will lead to different runtime and bucket size characteristics.
Context
StackExchange Computer Science Q#28951, answer score: 4
Revisions (0)
No revisions yet.