patternMinor

Perfect shuffle in parallel processing

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

computer-architecture cs parallel-computing stackoverflow

processingperfectparallelshuffle

Problem

How is Perfect shuffle a better interconnect scheme for parallel processing? For example if we consider a problem of sum reduction, I want to understand how this scheme is useful when implementing sum reduction in parallel , for example on a GPU?

Solution

The perfect shuffle alone has never been used as an interconnection network; it always included exchange links, in order to allow for a worst-case $O(\lg n)$ routing algorithm.
Algorithms for reduction, broadcast, parallel prefix, transpose etc are given in

Yosi Ben-Asher , David Egozi , Assaf Schuster, SIMD Algorithms for 2-D Arrays in Shuffle Networks.

Stone was the first researcher to provide algorithms for the perfect shuffle network, including algorithms for the FFT, matrix transpose etc.

This network has been used in several parallel machines, but mostly in multi-stage shuffle-exchange networks such as the Omega network (which uses the perfect shuffle between stages). An example is the IBM SP3 machine.

Note that these are message-passing algorithms, whilst you refer to a GPU, which is instead a shared-memory device.

Context

StackExchange Computer Science Q#10572, answer score: 3

Revisions (0)

No revisions yet.