patterncppModerate

Return the next power of two

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

codereview cpp stackoverflow c++bitwise performance

thereturnnextpowertwo

Problem

Given an unsigned integer \$N\$, I want to return \$P\$ with \$P\$ being a power of two such that \$P \ge N\$.

Examples:

1 -> 1

2 -> 2

3 -> 4

5 -> 8

The goal is to provide a very fast implementation. This code is supposed to target a specific platform and we can make the assumption that the intrinsics used are supported.

uint32_t nextPowerOfTwo(uint32_t n) {
    uint32_t msb_index = fls(n);
    uint32_t trailing_zeros = std::__ctz(n);
    bool is_pot = (msb_index - 1 == trailing_zeros);
    uint32_t p = (is_pot ? n : 1 << msb_index);
    return p;
}

Solution

If you really care about performance you may also give a chance to a branchless version. Note that compiler may optimize code with explicit branches (if and ?:) if target architecture supports CMOV (& similar) instructions, check generated assembly output. In these examples I will use X86 mnemonics but I suppose meaning is clear whichever your architecture is.

Also you're using non-standard fls() and std::__clz(), they may be implemented with instrinsics or not. If not then performance will greatly decrease because they need a loop or a lookup table instead of BSR (or LZCNT) and BSF (or TZCNT).

uint32_t nextPowerOfTwo(uint32_t n)
{
    --n;

    n |= n >> 1;
    n |= n >> 2;
    n |= n >> 4;
    n |= n >> 8;
    n |= n >> 16;

    return n + 1;
}

Note that 0 gives a wrong result then if you expect 0 as input and you don't want 0 as output you have to handle this special case (just return 1 or add 1 if your compiler will generate a branch for that.) AFAIK also implementation with BSx (or xZCNT) have the same problem (they just set ZF or CF) then special case has to be handled also there (if your implementation specific function does not already do it).

Code is equivalent to 1 << (log2(n - 1) + 1). I don't have a link to original code (algorithm is not mine!) but in my comments authors are W. Lewis and P. Hart) and (thanks Martin) there also is another author who arrived to the same code (with link to their original discussion/implementation!)

Which implementation is faster? It depends (see comments for a discussion with CodeRodde.) Target architecture (possibly revision), compiler vendor, version and settings will all affect results. Exact numbers are meaningless without details about your specific scenario, the point is that CodeRodde's answer is faster than this with MSVC++ but slower with GCC, the absolutely great answers from Jerry and JS1 also add more doubts about which method is best.

The only thing I can suggest is to implement them all and pick the fastest one in your specific case. If this is truly performance critical, because this function is so small, I'd also feel to suggest is to put some raw performance comparison in your unit tests: when current version will be slower than expected (compared to all the others) you will need to inspect again...

Code Snippets

uint32_t nextPowerOfTwo(uint32_t n)
{
    --n;

    n |= n >> 1;
    n |= n >> 2;
    n |= n >> 4;
    n |= n >> 8;
    n |= n >> 16;

    return n + 1;
}

Context

StackExchange Code Review Q#141210, answer score: 14

Revisions (0)

No revisions yet.