snippetMinor
how does the parallel radix sort work?
Viewed 0 times
theradixparallelworkdoeshowsort
Problem
I'm currently learning computer science, and there is a slide of notes brief described the parallel radix sort under data parallelism.
I roughly know how to sort it from lecturer's explanation, but how is it related to parallel computing to increase the performance?
number 101 110 011 001 111 (1st bit)
order 2 1 3 4 5 (new order)
number 110 101 011 001 111 (2nd bit)
order 3 1 4 2 5 (new order)
number 101 001 110 011 111 (3rd bit)
order 3 1 4 2 5 (new order)
number 001 011 101 110 111I roughly know how to sort it from lecturer's explanation, but how is it related to parallel computing to increase the performance?
Solution
There are many ways to do it. The following approach allows to fairly split work between many cores. I believe that it's used even in GPU implementations of radix sort, such as ones provided by Boost.Compute and CUDA Thrust.
I describe here one pass of LSD radix sort that distributes data into R buckets:
In pseudocode the entire pass looks like:
I describe here one pass of LSD radix sort that distributes data into R buckets:
- First stage: split input block into K parts, where K is number of cores sharing the work. Each core builds histogram for one part of data, counting how many elements from this part of data should go into each bucket:
Cnt[part][bucket]++
- Second stage: Wait till all cores finished the stage one, and then compute partial sums over counts, thus revealing an initial index of each bucket for every part of data. This is sequential algorithm.
- Third stage: each core again process its own part of data, sending each element into position determined by
Cnt[part][bucket]
In pseudocode the entire pass looks like:
parallel_for part in 0..K-1
for i in indexes(part)
bucket = compute_bucket(a[i])
Cnt[part][bucket]++
base = 0
for bucket in 0..R-1
for part in 0..K-1
Cnt[part][bucket] += base
base = Cnt[part][bucket]
parallel_for part in 0..K-1
for i in indexes(part)
bucket = compute_bucket(a[i])
out[Cnt[part][bucket]++] = a[i]Code Snippets
parallel_for part in 0..K-1
for i in indexes(part)
bucket = compute_bucket(a[i])
Cnt[part][bucket]++
base = 0
for bucket in 0..R-1
for part in 0..K-1
Cnt[part][bucket] += base
base = Cnt[part][bucket]
parallel_for part in 0..K-1
for i in indexes(part)
bucket = compute_bucket(a[i])
out[Cnt[part][bucket]++] = a[i]Context
StackExchange Computer Science Q#6871, answer score: 3
Revisions (0)
No revisions yet.