patternjavaMinor
IntModulus long addition and subtraction
Viewed 0 times
additionlongsubtractionandintmodulus
Problem
One of the remarks in this answer to my question concerning
What's more problematic is that you don't have matching argument symmetry for add and subtract. You should have long versions of those.
and the reason for this part missing is that I hoped to get something faster than the trivial
The optimized solution has only one modulus operation (sure, only benchmarking will tell if it was worth it).
Please have a short look at the original question before reviewing. I mean the comments, not the code.
```
public final class IntModulus {
///////// Irrelevant code removed, see the original question if you miss anything.
///////// New code below.
/** Return a non-negative value less than modulus and congruent to the exact sum. */
public int add(long x, long y) {
final long sum = x + y;
// The addition overflowed iff both operands have the same sign are the result's sign differs.
final boolean overflow = (x^y) >= 0 & (x^sum) This means that the sign is surely wrong and that the number would fit into a 65-bit signed integer.
*/
private int overflowedMod(long x) {
// The proper value of the argument divided by two and floored (i.e., rounded towards negative infinity).
// The sign bit got restored by simply flipping it as we know it was wrong.
final long half = (x >> 1) ^ Long.MIN_VALUE;
final long lsb = x & 1; // the least significant bit
long result = 2L * mod(half) + lsb;
// Now the result lies in the range 0 to 2*modulus+1 (both included), so conditionally reduce it twice.
if (result>=modulus) result -= modulus;
if (result>=modulus) result -= modulus;
return (int) result;
}
///////// Old but relevant code below.
/** Return a non-negative value less than modulus and congruent to the operand. */
public int mod(long x) {
// As m
IntModulus saysWhat's more problematic is that you don't have matching argument symmetry for add and subtract. You should have long versions of those.
and the reason for this part missing is that I hoped to get something faster than the trivial
public int add(long x, long y) {
return mod(mod(x) + mod(y));
}The optimized solution has only one modulus operation (sure, only benchmarking will tell if it was worth it).
Please have a short look at the original question before reviewing. I mean the comments, not the code.
```
public final class IntModulus {
///////// Irrelevant code removed, see the original question if you miss anything.
///////// New code below.
/** Return a non-negative value less than modulus and congruent to the exact sum. */
public int add(long x, long y) {
final long sum = x + y;
// The addition overflowed iff both operands have the same sign are the result's sign differs.
final boolean overflow = (x^y) >= 0 & (x^sum) This means that the sign is surely wrong and that the number would fit into a 65-bit signed integer.
*/
private int overflowedMod(long x) {
// The proper value of the argument divided by two and floored (i.e., rounded towards negative infinity).
// The sign bit got restored by simply flipping it as we know it was wrong.
final long half = (x >> 1) ^ Long.MIN_VALUE;
final long lsb = x & 1; // the least significant bit
long result = 2L * mod(half) + lsb;
// Now the result lies in the range 0 to 2*modulus+1 (both included), so conditionally reduce it twice.
if (result>=modulus) result -= modulus;
if (result>=modulus) result -= modulus;
return (int) result;
}
///////// Old but relevant code below.
/** Return a non-negative value less than modulus and congruent to the operand. */
public int mod(long x) {
// As m
Solution
Small improvements
Overall, it seems that what you wrote is correct and is an improvement over the
Add instead of multiply by 2
You can replace this line:
with:
The modified version showed a 1% improvement in the overflow case. (See benchmarks below)
Remove one unnecessary check
Following the code above, you have this code:
But actually, the value of
By removing the second check, I found a 1% speed improvement in the overflow case.
Checking for overflow
I replaced this overflow check for add:
with this very similar one:
Edit: Later I found that this inverted version is faster:
It tests the same thing but uses one extra xor instead of a comparison. For the overflow case I found a 2% speed improvement, and it was about 4% faster in the non-overflow case.
For subtraction I did a similar thing:
became:
Here I didn't even have to add an extra xor.
Benchmarks
I took the suggestion of using JMH to run benchmarks. Here are the results:
Overall, it seems that what you wrote is correct and is an improvement over the
mod(mod(x)+mod(y)) solution, especially in the case where there is no overflow. I have three small improvements:Add instead of multiply by 2
You can replace this line:
long result = 2L * mod(half) + lsb;with:
long halfmod = mod(half);
long result = halfmod + halfmod + lsb;The modified version showed a 1% improvement in the overflow case. (See benchmarks below)
Remove one unnecessary check
Following the code above, you have this code:
// Now the result lies in the range 0 to 2*modulus+1 (both included),
// so conditionally reduce it twice.
if (result>=modulus) result -= modulus;
if (result>=modulus) result -= modulus;But actually, the value of
mod(half) is in the range 0..modulus-1. So if you multiply that by 2 and add 1, result will be in the range 0..2modulus-1, not 0..2modulus+1 as you stated in the comment. Therefore, after the first result -= modulus, the range of result will be 0..modulus-1 and there is no need for the second check.// Now the result lies in the range 0 to 2*modulus-1 (both included),
// so conditionally reduce it once.
if (result>=modulus) result -= modulus;By removing the second check, I found a 1% speed improvement in the overflow case.
Checking for overflow
I replaced this overflow check for add:
final boolean overflow = (x^y) >= 0 & (x^sum) < 0;with this very similar one:
final boolean overflow = ((x^y^Long.MIN_VALUE) & (x^sum)) < 0;Edit: Later I found that this inverted version is faster:
final boolean overflow = ((x^y) | ~(x^sum)) >= 0;It tests the same thing but uses one extra xor instead of a comparison. For the overflow case I found a 2% speed improvement, and it was about 4% faster in the non-overflow case.
For subtraction I did a similar thing:
final boolean overflow = (x^y) < 0 & (x^diff) < 0;became:
final boolean overflow = ((x^y) & (x^diff)) < 0;Here I didn't even have to add an extra xor.
Benchmarks
I took the suggestion of using JMH to run benchmarks. Here are the results:
Overflow add test
-----------------
Methodology: Add 0x4000000000000000L with itself modulo 0x12345678
in a loop of 100000 iterations.
Method Operations per second Speed
------ --------------------- -----
Original code 238.754 +- 0.778 100.0%
Multiply change 241.527 +- 0.884 101.1%
Remove extra check 241.180 +- 1.007 101.0%
Change overflow check 243.335 +- 1.662 101.9%
All changes 256.452 +- 1.325 107.4%
Non-overflow add test
---------------------
Methodology: Add 0x4000000000000000L with 1 modulo 0x12345678
in a loop of 100000 iterations.
Method Operations per second Speed
------ --------------------- -----
Original code 330.298 +- 1.635 100.0%
Change overflow check 342.496 +- 2.037 103.7%
Code Snippets
long result = 2L * mod(half) + lsb;long halfmod = mod(half);
long result = halfmod + halfmod + lsb;// Now the result lies in the range 0 to 2*modulus+1 (both included),
// so conditionally reduce it twice.
if (result>=modulus) result -= modulus;
if (result>=modulus) result -= modulus;// Now the result lies in the range 0 to 2*modulus-1 (both included),
// so conditionally reduce it once.
if (result>=modulus) result -= modulus;final boolean overflow = (x^y) >= 0 & (x^sum) < 0;Context
StackExchange Code Review Q#95173, answer score: 3
Revisions (0)
No revisions yet.