HiveBrain v1.2.0
Get Started
← Back to all entries
snippetMinor

Algorithm to partially sort list into equal-sized buckets

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
bucketsequalintosizedpartiallyalgorithmsortlist

Problem

Suppose I have a large list of numbers that I want to divide into equal-sized buckets so that every bucket contains only larger numbers than buckets to its left. Numbers within each bucket don't need to be sorted.

For example, for the input

[64, 72, 57, 47, 5, 4, 64, 21, 64, 65, 36, 43, 81, 44, 19, 87, 17, 86, 
 73, 21, 19, 64, 94, 91, 34, 49, 8, 52, 18, 37]


that I want to divide into 5 buckets, some valid outputs would be

[[4, 5, 8, 17, 18], [19, 19, 21, 21, 34], [36, 37, 43, 44, 47], 
 [49, 52, 57, 64, 64], [64, 64, 65, 72, 73]]

[[8, 17, 18, 5, 4], [21, 19, 21, 34, 19], [43, 37, 44, 36, 47], 
 [52, 57, 49, 64, 64], [64, 64, 72, 73, 65]]

[[8, 18, 5, 4, 17], [19, 19, 21, 21, 34], [47, 36, 44, 43, 37], 
 [52, 64, 49, 64, 57], [64, 72, 64, 65, 73]]

[[18, 8, 17, 5, 4], [21, 19, 21, 19, 34], [36, 44, 43, 37, 47], 
 [57, 64, 64, 49, 52], [64, 72, 64, 65, 73]]


One approach to do this would be to sort the list, and then divide it at equal intervals. Can this be done faster?

I would also be happy with a result where the buckets are of only roughly equal size, but note that I can't assume a uniform distribution for the numbers, so bucket sort doesn't help.

Solution

Assuming you want to create $k$ buckets, to the following.

  • Obtain elements of rank $\lceil n/k \rceil, 2 \cdot \lceil n/k \rceil \dots, (k-1) \cdot \lceil n/k \rceil$.



  • Perform multi-way partitioning (cf. Quicksort) with these elements as pivot.



Both steps take time $\Theta(kn)$. All buckets are within $\pm 1$ of the same size.

In the case of duplicate elements, you can "uniquify" them in a $\Theta(n)$ preprocessing run; otherwise bucket sizes might differ more.

Essentially, this is performing only the top-level call of a $k$-way Quicksort with perfect pivots. All variants regarding pivot selection and partitioning apply and will lead to different runtime and bucket size characteristics.

Context

StackExchange Computer Science Q#28951, answer score: 4

Revisions (0)

No revisions yet.