patterncppModerate

Gray codes addition

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

codereview c++14 cpp c++stackoverflow algorithm bitwise performance

additioncodesgray

Problem

As some of you know, I have been working with Gray codes and am trying to design a good gray_code class in modern C++. I will end up posting several parts of the implementation, but what I want to be reviewed is mostly the addition function. Anyway, here is the code of the class essentials:

template
struct gray_code
{
    static_assert(std::is_unsigned::value,
                  "gray code only supports built-in unsigned integers");

    // Underlying unsigned integer
    using value_type = Unsigned;

    // Variable containing the gray code
    value_type value;

    ////////////////////////////////////////////////////////////
    // Constructors operations

    // Default constructor
    constexpr gray_code() noexcept:
        value(0u)
    {}

    /**
     * @brief Construction from an unsigned integer.
     *
     * The integer is converted to gray code. The value
     * is preserved. The representation is not.
     *
     * @param value Unsigned integer to convert
     */
    constexpr explicit gray_code(value_type value) noexcept:
        value( (value >> 1) ^ value )
    {}

    ////////////////////////////////////////////////////////////
    // Assignment operations

    /**
     * @brief Assigns an unsigned integer.
     *
     * It works the same way as the equivalent
     * constructor does.
     */
    auto operator=(value_type other) & noexcept
        -> gray_code&
    {
        value = (other >> 1) ^ other;
        return *this;
    }

    ////////////////////////////////////////////////////////////
    // Conversion operations

    /**
     * @brief Conversion to the underlying type.
     *
     * See http://www.dspguru.com/dsp/tricks/gray-code-conversion
     */
    operator value_type() const noexcept
    {
        value_type res = value;
        for (value_type mask = std::numeric_limits::digits / 2
             ; mask ; mask >>= 1)
        {
            res ^= res >> mask;
        }
        return res;
    }
};

As you can see, a `gray_

Solution

Loop invariants code motion

This is the name of an optimization performed by most compilers: when they detect code that actually does not depend on the state of the loop, they move it out of the loop. It's not as obvious as the usual invariants, but let's have a look at this line:

Unsigned res_i = (lhs_p_cpy & rhs_p_cpy) ^ lhs_i ^ rhs_i;

lhs_i and rhs_i correspond to the \$ i_{th} \$ bit of lhs and rhs. These bits do not depend on the loop (the loop does not change them) and can all be computed at once by performing lhs ^ rhs before running the loop. Therefore, we can simplify the function by moving them out of the loop. If we store them directly in res, we can even simplify the assignment to res to a mere res ^= res_i << i:

template
auto operator+(gray_code lhs, gray_code rhs) noexcept
    -> gray_code
{
    // parity of lhs and rhs
    bool lhs_p = is_odd(lhs);
    bool rhs_p = is_odd(rhs);

    gray_code res = lhs ^ rhs;
    for (Unsigned i{} ; i ::digits ; ++i)
    {
        // Get the ith bit of rhs and lhs
        bool lhs_i = (lhs.value >> i) & 1u;
        bool rhs_i = (rhs.value >> i) & 1u;

        // Copy lhs_p and rhs_p (see {in parallel} in the original algorithm)
        bool lhs_p_cpy = lhs_p;
        bool rhs_p_cpy = rhs_p;

        // Set the ith bit of res
        Unsigned res_i = lhs_p_cpy & rhs_p_cpy;
        res ^= res_i << i;

        // Update e and f
        lhs_p = (lhs_p_cpy & (!rhs_p_cpy)) ^ lhs_i;
        rhs_p = ((!lhs_p_cpy) & rhs_p_cpy) ^ rhs_i;
    }
    return res;
}

Unfortunately, lhs_i and rhs_i are also used to compute the new values of lhs_p and rhs_p, so we can't remove them from the body of the loop anyway. If we are to optimize the algorithm even more, we will need something else, for example...

More efficient parity algorithm

It hardly changes anything since is_odd is only used at the very beginning of the algorithm, but its fallback implementation when __builtin_parity cannot be used is suboptimal. Actually, the parity of an unsigned integer can be obtained by xoring together every bit, without having to compute the number of set bits. The overall xoring of bits can be done "in parallel" instead of bit by bit with the following algorithm:

for (std::size_t i = std::numeric_limits::digits / 2u ; i ; i >>= 1u)
{
    code.value ^= (code.value >> i);
}
return bool(code.value & 1u);

This algorithm will xor the first half of an unsigned integer with the second half, then repeat the operation with the result... until there are only two bits left to xor, leading to a more efficient operation than a population count. This page even provides a more subtle implementation of the algorithm to "skip" the last iterations. I didn't really dig the proposed implementations with magic numbers since I want my code to still work with integers of any size.

Getting rid of the temporaries

The part of the Gray addition which is likely to be be the slowest is the body of the for loop. To find a possible optimization, I decided to see wether I could get rid of lhs_p_cpy and rhs_p_cpy in it, which are copies of lhs_p and rhs_p required to compute the new value of rhs_p. While it is easy to get rid of rhs_p_cpy by simply replacing it by rhs_p, getting rid of lhs_p_cpy is harder because lhs_p has already been updated meanwhile. Without any trickery, the following loop body is the best we can get:

Unsigned res_i = lhs_p & rhs_p;
res ^= res_i > i) & 1u;
bool rhs_i = (rhs.value >> i) & 1u;
lhs_p = (lhs_p_cpy & not rhs_p) ^ lhs_i;
rhs_p = (rhs_p & not lhs_p_cpy) ^ rhs_i;

Therefore, I decided to build some truth tables. In the following table, \$ lhs_{old} \$ and \$ rhs_{old} \$ are possible values of lhs_p and rhs_p before the update while \$ lhs_{new} \$ and \$ rhs_{new} \$ are the values of the same variables after the update. I did not take any interest in lhs_i and rhs_i since none of them is used to compute both \$ lhs_{new} \$ and \$ rhs_{new} \$; therefore, I simply ignore them in the rest of this reflection.

\begin{array} {|cc|cc|}
\hline
lhs_{old} & rhs_{old} & lhs_{new} & rhs_{new} \\
\hline
0 & 0 & 0 & 0\\
0 & 1 & 0 & 1\\
1 & 0 & 1 & 0\\
1 & 1 & 0 & 0\\
\hline
\end{array}

We can easily replace lhs_p_cpy by \$ lhs_{old} \$ in the computation of \$ lhs_{new} \$ but we can't use it to compute \$ rhs_{new} \$ since the update already occured. My first thought to get rid of lhs_p_cpy was "can we compute \$ rhs_{new} \$ with only \$ rhs_{old} \$ and \$ lhs_{new} \$?". Looking at the truth table above, it appears that it is not possible. Trying to compute \$ rhs_{new} \$ before \$ lhs_{new} \$ doesn't solve the problem either.

However, we already have computed another value in the loop: res_i. Therefore, I injected \$ res_i \$ in the table and looked at what could be done with it:

\begin{array} {|cc|c|cc|}
\hline
lhs_{old} & rhs_{old} & res_i = lhs_{old} \land rhs_{old} & lhs_{ne

Code Snippets

Unsigned res_i = (lhs_p_cpy & rhs_p_cpy) ^ lhs_i ^ rhs_i;

template<typename Unsigned>
auto operator+(gray_code<Unsigned> lhs, gray_code<Unsigned> rhs) noexcept
    -> gray_code<Unsigned>
{
    // parity of lhs and rhs
    bool lhs_p = is_odd(lhs);
    bool rhs_p = is_odd(rhs);

    gray_code<Unsigned> res = lhs ^ rhs;
    for (Unsigned i{} ; i < std::numeric_limits<Unsigned>::digits ; ++i)
    {
        // Get the ith bit of rhs and lhs
        bool lhs_i = (lhs.value >> i) & 1u;
        bool rhs_i = (rhs.value >> i) & 1u;

        // Copy lhs_p and rhs_p (see {in parallel} in the original algorithm)
        bool lhs_p_cpy = lhs_p;
        bool rhs_p_cpy = rhs_p;

        // Set the ith bit of res
        Unsigned res_i = lhs_p_cpy & rhs_p_cpy;
        res ^= res_i << i;

        // Update e and f
        lhs_p = (lhs_p_cpy & (!rhs_p_cpy)) ^ lhs_i;
        rhs_p = ((!lhs_p_cpy) & rhs_p_cpy) ^ rhs_i;
    }
    return res;
}

for (std::size_t i = std::numeric_limits<Unsigned>::digits / 2u ; i ; i >>= 1u)
{
    code.value ^= (code.value >> i);
}
return bool(code.value & 1u);

Unsigned res_i = lhs_p & rhs_p;
res ^= res_i << i;

bool lhs_p_cpy = lhs_p;
bool lhs_i = (lhs.value >> i) & 1u;
bool rhs_i = (rhs.value >> i) & 1u;
lhs_p = (lhs_p_cpy & not rhs_p) ^ lhs_i;
rhs_p = (rhs_p & not lhs_p_cpy) ^ rhs_i;

Unsigned res_i = lhs_p & rhs_p;
res ^= res_i << i;

bool lhs_i = (lhs.value >> i) & 1u;
bool rhs_i = (rhs.value >> i) & 1u;
lhs_p = (lhs_p & not rhs_p) ^ lhs_i;
rhs_p = (rhs_p & not res_i) ^ rhs_i;

Context

StackExchange Code Review Q#69122, answer score: 13

Revisions (0)

No revisions yet.