patterncMinor

Parallel accumulated sum in OpenCL

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

openclparallelaccumulatedsum

Problem

This is my first relevant code in OpenCL. Please, let me know if I'm doing something wrong in a logical level (i.e., I guess it can crash in situations?) or accounting for performance (bottlenecks, bad use of resources, misunderstanding some concepts, etc).

void accumulatedSum(
    volatile __global u32* completeBlocks, 
    volatile __global u32* input, 
    volatile __global u32* output, 
    __local u32* loc) {
    // Receives an input such as:    [1, 3, 0, 0, 4,  7,  0,  0,  3,  4,  1,  1]
    // Computes the accumulated sum: [1, 4, 4, 4, 8, 15, 15, 15, 18, 22, 23, 24]
    // Always call it with global_work_size = num_of_elements / 2
    // Works like this: http://imgur.com/3lm3CAq
    unsigned int k, a;
    int lid          = get_local_id(0);
    int gid          = get_global_id(0);
    int groupId      = get_group_id(0);
    int localPasses  = get_local_size(0);
    int globalPasses = get_num_groups(0)*2;
    loc[lid*2]       = input[gid*2];
    loc[lid*2+1]     = input[gid*2+1];
    barrier(CLK_LOCAL_MEM_FENCE);
    for (a=0; localPasses>>=1; ++a){
        k = (lid >> a >=1; ++a){
        barrier(CLK_GLOBAL_MEM_FENCE);
        while (*completeBlocks != 0);
        if (lid == 0) atomic_inc(completeBlocks);
        while (*completeBlocks > a << (a+1)) + (1<<a);
        output[k + (lid&((1<<a)-1))] = output[k - 1] + output[k + (gid&((1<<a)-1))];
    };
};

If you have clever ideas in how to make it more readable please let me know, although that is not the focus.

Solution

-
What are k and a? It looks like a is a loop counter variable, in which case it can be renamed to i and moved to the front of the first for loop. As for k, rename it such that it conveys its purpose. With the exception of loop counter variables such as i, variables shouldn't be a single character.

-
Many of the lines in the for loops are difficult to read, especially with all the bit-shifting. Depending on the type of shift, the optimizing compiler may be able to replace some arithmetic operation with an appropriate bitshift. But for others, it would help to leave comments to explain what's being done for each complex operation.

-
This is quite crammed together:

for (; globalPasses>>=1; ++a){
    barrier(CLK_GLOBAL_MEM_FENCE);
    while (*completeBlocks != 0);
    if (lid == 0) atomic_inc(completeBlocks);
    while (*completeBlocks > a << (a+1)) + (1<<a);
    output[k + (lid&((1<<a)-1))] = output[k - 1] + output[k + (gid&((1<<a)-1))];
};

You should isolate the while loops so that it's easy to see that they're busy-waits. You could also consider using curly braces for the if statements. Finally, consider adding whitespace between the operators for more readability.

for (; globalPasses >>= 1; ++a) {
    barrier(CLK_GLOBAL_MEM_FENCE);

    while (*completeBlocks != 0);

    if (lid == 0) {
        atomic_inc(completeBlocks);
    }

    while (*completeBlocks > a << (a + 1)) + (1 << a);

    output[k + (lid&((1 << a) - 1))] = 
        output[k - 1] + output[k + (gid&((1 < <a) - 1))];
};

Code Snippets

for (; globalPasses>>=1; ++a){
    barrier(CLK_GLOBAL_MEM_FENCE);
    while (*completeBlocks != 0);
    if (lid == 0) atomic_inc(completeBlocks);
    while (*completeBlocks < get_num_groups(0));
    if (lid == 0 && gid == 0) atomic_xchg(completeBlocks,0);
    k = (gid >> a << (a+1)) + (1<<a);
    output[k + (lid&((1<<a)-1))] = output[k - 1] + output[k + (gid&((1<<a)-1))];
};

for (; globalPasses >>= 1; ++a) {
    barrier(CLK_GLOBAL_MEM_FENCE);

    while (*completeBlocks != 0);

    if (lid == 0) {
        atomic_inc(completeBlocks);
    }

    while (*completeBlocks < get_num_groups(0));

    if (lid == 0 && gid == 0) {
        atomic_xchg(completeBlocks, 0);
    }

    k = (gid >> a << (a + 1)) + (1 << a);

    output[k + (lid&((1 << a) - 1))] = 
        output[k - 1] + output[k + (gid&((1 < <a) - 1))];
};

Context

StackExchange Code Review Q#64052, answer score: 6

Revisions (0)

No revisions yet.