patterncppMinor

Optimizing multiplication of two polynomials in GF(2^32)

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

codereview sse cpp c++stackoverflow c++11 bitwise performance

twopolynomialsoptimizingmultiplication

Problem

I have the following method for multiplying two polynomials in GF(232):

#include 
#include 
#include 
#include 
#include 

typedef unsigned short uint16_t;
typedef unsigned int uint32_t;

unsigned get_high16(unsigned x)
{
    return (x >> 16) & 0xffff;
}

unsigned get_low16(unsigned x)
{
    return x & 0xffff;
}

uint32_t mul_reduce_32(uint32_t x, uint32_t y)
{
    const __m128i a = _mm_set_epi32(0, 0, 0, x);
    const __m128i b = _mm_set_epi32(0, 0, 0, y);
    const __m128i c =_mm_clmulepi64_si128(a, b, 0);

    uint16_t D = _mm_extract_epi16(c, 2);

    D ^= _mm_extract_epi32(c, 1) >> 31;
    D ^= _mm_extract_epi32(c, 1) >> 30;
    D ^= _mm_extract_epi32(c, 1) >> 29;
    D ^= _mm_extract_epi32(c, 1) >> 27;
    D ^= _mm_extract_epi32(c, 1) >> 25;

    const uint16_t X3 = _mm_extract_epi16(c, 3);
    const __m128i X3D = _mm_set_epi16(0, 0, 0, 0, 0, 0, X3, D);

    const uint32_t E = _mm_extract_epi32(X3D, 0) << 7;
    const uint32_t F = _mm_extract_epi32(X3D, 0) << 5;
    const uint32_t G = _mm_extract_epi32(X3D, 0) << 3;
    const uint32_t I = _mm_extract_epi32(X3D, 0) << 2;
    const uint32_t J = _mm_extract_epi32(X3D, 0) << 1;

    const uint16_t H1 = X3 ^ get_high16(E) ^ get_high16(F) ^ get_high16(G) ^ get_high16(I) ^ get_high16(J);
    const uint16_t H0 = D ^ get_low16(E) ^ get_low16(F) ^ get_low16(G) ^ get_low16(I) ^ get_low16(J);

    const uint16_t R1 = _mm_extract_epi16(c, 1) ^ H1;
    const uint16_t R0 = _mm_extract_epi16(c, 0) ^ H0;

    return _mm_extract_epi32(_mm_set_epi16(0, 0, 0, 0, 0, 0, R1, R0), 0);
}

The following is a small test program and benchmark for the function.

```
int main()
{
assert(mul_reduce_32(1, 1) == 1);
assert(mul_reduce_32(23523, 34651) == 744709325);
assert(mul_reduce_32(34652346, 5534651) == 1329462203);

// A little benchmark
std::mt19937 rng(std::time(0));
std::uniform_int_distribution uint_dist10(0, std::numeric_limits::max());

const int N = 100000000;
std::vector vec1(N, 0);

Solution

For the time output in main():

Use std::clock_t as opposed to clock_t in C++.

You could have another variable, endTime, for the ending time.

The computed time could also have its own variable, elapsedTime, to be printed.

Only the clock time needs to be cast to a double, not the macro as well. You should also use static_cast<>() as this is C++.

Example with changes:

std::clock_t startTime = std::clock();

// ...

std::clock_t endTime = std::clock();

double elapsedTime = static_cast(endTime - startTime) / CLOCKS_PER_SEC;

std::cout << elapsedTime;

But, since you're using C++11, consider utilizing the ` library instead. Unlike , it is more C++-oriented but still contains some

 aspects.

Side-notes:

For std::time, 0 can be replaced with nullptr in C++11.

Consider renaming N to something more descriptive. Using a constant in place of a magic number is nice, but you can increase readability by using a better name.




Since only your test program uses those first four libraries, they should just be in that file.

You don't need those typedefs in your first file; they are already defined in `.

Specific questions:

Currently, I am using many temporary variables. I could try to help the compiler by removing many of them, but this way the code should be more readable and easier for someone to modify.

I would rely on the compiler to handle this, as it should be able to optimize them out. But you're right; readability and maintainability are also important. They can always be adjusted accordingly.

You can also maintain these two things by:

avoiding single-character variable names

keeping variables as close in scope as possible

You seem to do well with the second point, but not so much the first. I've already addressed this in the side-notes, but this is a problem throughout the first file. By using more descriptive names, it'll be much easier for others to understand your code and make any needed modifications.

Code Snippets

std::clock_t startTime = std::clock();

// ...

std::clock_t endTime = std::clock();

double elapsedTime = static_cast<double>(endTime - startTime) / CLOCKS_PER_SEC;

std::cout << elapsedTime;

Context

StackExchange Code Review Q#48857, answer score: 5

Revisions (0)

No revisions yet.