patterncppMinor
Optimizing multiplication of two polynomials in GF(2^32)
Viewed 0 times
twopolynomialsoptimizingmultiplication
Problem
I have the following method for multiplying two polynomials in GF(232):
The following is a small test program and benchmark for the function.
```
int main()
{
assert(mul_reduce_32(1, 1) == 1);
assert(mul_reduce_32(23523, 34651) == 744709325);
assert(mul_reduce_32(34652346, 5534651) == 1329462203);
// A little benchmark
std::mt19937 rng(std::time(0));
std::uniform_int_distribution uint_dist10(0, std::numeric_limits::max());
const int N = 100000000;
std::vector vec1(N, 0);
#include
#include
#include
#include
#include
typedef unsigned short uint16_t;
typedef unsigned int uint32_t;
unsigned get_high16(unsigned x)
{
return (x >> 16) & 0xffff;
}
unsigned get_low16(unsigned x)
{
return x & 0xffff;
}
uint32_t mul_reduce_32(uint32_t x, uint32_t y)
{
const __m128i a = _mm_set_epi32(0, 0, 0, x);
const __m128i b = _mm_set_epi32(0, 0, 0, y);
const __m128i c =_mm_clmulepi64_si128(a, b, 0);
uint16_t D = _mm_extract_epi16(c, 2);
D ^= _mm_extract_epi32(c, 1) >> 31;
D ^= _mm_extract_epi32(c, 1) >> 30;
D ^= _mm_extract_epi32(c, 1) >> 29;
D ^= _mm_extract_epi32(c, 1) >> 27;
D ^= _mm_extract_epi32(c, 1) >> 25;
const uint16_t X3 = _mm_extract_epi16(c, 3);
const __m128i X3D = _mm_set_epi16(0, 0, 0, 0, 0, 0, X3, D);
const uint32_t E = _mm_extract_epi32(X3D, 0) << 7;
const uint32_t F = _mm_extract_epi32(X3D, 0) << 5;
const uint32_t G = _mm_extract_epi32(X3D, 0) << 3;
const uint32_t I = _mm_extract_epi32(X3D, 0) << 2;
const uint32_t J = _mm_extract_epi32(X3D, 0) << 1;
const uint16_t H1 = X3 ^ get_high16(E) ^ get_high16(F) ^ get_high16(G) ^ get_high16(I) ^ get_high16(J);
const uint16_t H0 = D ^ get_low16(E) ^ get_low16(F) ^ get_low16(G) ^ get_low16(I) ^ get_low16(J);
const uint16_t R1 = _mm_extract_epi16(c, 1) ^ H1;
const uint16_t R0 = _mm_extract_epi16(c, 0) ^ H0;
return _mm_extract_epi32(_mm_set_epi16(0, 0, 0, 0, 0, 0, R1, R0), 0);
}The following is a small test program and benchmark for the function.
```
int main()
{
assert(mul_reduce_32(1, 1) == 1);
assert(mul_reduce_32(23523, 34651) == 744709325);
assert(mul_reduce_32(34652346, 5534651) == 1329462203);
// A little benchmark
std::mt19937 rng(std::time(0));
std::uniform_int_distribution uint_dist10(0, std::numeric_limits::max());
const int N = 100000000;
std::vector vec1(N, 0);
Solution
For the time output in
Example with changes:
But, since you're using C++11, consider utilizing the `
Specific questions:
Currently, I am using many temporary variables. I could try to help the compiler by removing many of them, but this way the code should be more readable and easier for someone to modify.
I would rely on the compiler to handle this, as it should be able to optimize them out. But you're right; readability and maintainability are also important. They can always be adjusted accordingly.
You can also maintain these two things by:
You seem to do well with the second point, but not so much the first. I've already addressed this in the side-notes, but this is a problem throughout the first file. By using more descriptive names, it'll be much easier for others to understand your code and make any needed modifications.
main():- Use
std::clock_tas opposed toclock_tin C++.
- You could have another variable,
endTime, for the ending time.
- The computed time could also have its own variable,
elapsedTime, to be printed.
- Only the clock time needs to be cast to a
double, not the macro as well. You should also usestatic_cast<>()as this is C++.
Example with changes:
std::clock_t startTime = std::clock();
// ...
std::clock_t endTime = std::clock();
double elapsedTime = static_cast(endTime - startTime) / CLOCKS_PER_SEC;
std::cout << elapsedTime;But, since you're using C++11, consider utilizing the `
library instead. Unlike , it is more C++-oriented but still contains some aspects.
Side-notes:
- For
std::time, 0 can be replaced with nullptr in C++11.
- Consider renaming
N to something more descriptive. Using a constant in place of a magic number is nice, but you can increase readability by using a better name.
- Since only your test program uses those first four libraries, they should just be in that file.
- You don't need those
typedefs in your first file; they are already defined in `.Specific questions:
Currently, I am using many temporary variables. I could try to help the compiler by removing many of them, but this way the code should be more readable and easier for someone to modify.
I would rely on the compiler to handle this, as it should be able to optimize them out. But you're right; readability and maintainability are also important. They can always be adjusted accordingly.
You can also maintain these two things by:
- avoiding single-character variable names
- keeping variables as close in scope as possible
You seem to do well with the second point, but not so much the first. I've already addressed this in the side-notes, but this is a problem throughout the first file. By using more descriptive names, it'll be much easier for others to understand your code and make any needed modifications.
Code Snippets
std::clock_t startTime = std::clock();
// ...
std::clock_t endTime = std::clock();
double elapsedTime = static_cast<double>(endTime - startTime) / CLOCKS_PER_SEC;
std::cout << elapsedTime;Context
StackExchange Code Review Q#48857, answer score: 5
Revisions (0)
No revisions yet.