patternMinor
Perfect shuffle in parallel processing
Viewed 0 times
processingperfectparallelshuffle
Problem
How is Perfect shuffle a better interconnect scheme for parallel processing? For example if we consider a problem of sum reduction, I want to understand how this scheme is useful when implementing sum reduction in parallel , for example on a GPU?
Solution
The perfect shuffle alone has never been used as an interconnection network; it always included exchange links, in order to allow for a worst-case $O(\lg n)$ routing algorithm.
Algorithms for reduction, broadcast, parallel prefix, transpose etc are given in
Yosi Ben-Asher , David Egozi , Assaf Schuster, SIMD Algorithms for 2-D Arrays in Shuffle Networks.
Stone was the first researcher to provide algorithms for the perfect shuffle network, including algorithms for the FFT, matrix transpose etc.
This network has been used in several parallel machines, but mostly in multi-stage shuffle-exchange networks such as the Omega network (which uses the perfect shuffle between stages). An example is the IBM SP3 machine.
Note that these are message-passing algorithms, whilst you refer to a GPU, which is instead a shared-memory device.
Algorithms for reduction, broadcast, parallel prefix, transpose etc are given in
Yosi Ben-Asher , David Egozi , Assaf Schuster, SIMD Algorithms for 2-D Arrays in Shuffle Networks.
Stone was the first researcher to provide algorithms for the perfect shuffle network, including algorithms for the FFT, matrix transpose etc.
This network has been used in several parallel machines, but mostly in multi-stage shuffle-exchange networks such as the Omega network (which uses the perfect shuffle between stages). An example is the IBM SP3 machine.
Note that these are message-passing algorithms, whilst you refer to a GPU, which is instead a shared-memory device.
Context
StackExchange Computer Science Q#10572, answer score: 3
Revisions (0)
No revisions yet.