patternjavaMinor

IntModulus long addition and subtraction

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

additionlongsubtractionandintmodulus

Problem

One of the remarks in this answer to my question concerning IntModulus says

What's more problematic is that you don't have matching argument symmetry for add and subtract. You should have long versions of those.

and the reason for this part missing is that I hoped to get something faster than the trivial

public int add(long x, long y) {
    return mod(mod(x) + mod(y));
}

The optimized solution has only one modulus operation (sure, only benchmarking will tell if it was worth it).

Please have a short look at the original question before reviewing. I mean the comments, not the code.

```
public final class IntModulus {
///////// Irrelevant code removed, see the original question if you miss anything.

///////// New code below.

/** Return a non-negative value less than modulus and congruent to the exact sum. */
public int add(long x, long y) {
final long sum = x + y;
// The addition overflowed iff both operands have the same sign are the result's sign differs.
final boolean overflow = (x^y) >= 0 & (x^sum) This means that the sign is surely wrong and that the number would fit into a 65-bit signed integer.
*/
private int overflowedMod(long x) {
// The proper value of the argument divided by two and floored (i.e., rounded towards negative infinity).
// The sign bit got restored by simply flipping it as we know it was wrong.
final long half = (x >> 1) ^ Long.MIN_VALUE;
final long lsb = x & 1; // the least significant bit
long result = 2L * mod(half) + lsb;
// Now the result lies in the range 0 to 2*modulus+1 (both included), so conditionally reduce it twice.
if (result>=modulus) result -= modulus;
if (result>=modulus) result -= modulus;
return (int) result;
}

///////// Old but relevant code below.

/** Return a non-negative value less than modulus and congruent to the operand. */
public int mod(long x) {
// As m

Solution

Small improvements

Overall, it seems that what you wrote is correct and is an improvement over the mod(mod(x)+mod(y)) solution, especially in the case where there is no overflow. I have three small improvements:

Add instead of multiply by 2

You can replace this line:

long result = 2L * mod(half) + lsb;

with:

long halfmod = mod(half);
    long result = halfmod + halfmod + lsb;

The modified version showed a 1% improvement in the overflow case. (See benchmarks below)

Remove one unnecessary check

Following the code above, you have this code:

// Now the result lies in the range 0 to 2*modulus+1 (both included),
    // so conditionally reduce it twice.
    if (result>=modulus) result -= modulus;
    if (result>=modulus) result -= modulus;

But actually, the value of mod(half) is in the range 0..modulus-1. So if you multiply that by 2 and add 1, result will be in the range 0..2modulus-1, not 0..2modulus+1 as you stated in the comment. Therefore, after the first result -= modulus, the range of result will be 0..modulus-1 and there is no need for the second check.

// Now the result lies in the range 0 to 2*modulus-1 (both included),
    // so conditionally reduce it once.
    if (result>=modulus) result -= modulus;

By removing the second check, I found a 1% speed improvement in the overflow case.

Checking for overflow

I replaced this overflow check for add:

final boolean overflow = (x^y) >= 0 & (x^sum) < 0;

with this very similar one:

final boolean overflow = ((x^y^Long.MIN_VALUE) & (x^sum)) < 0;

Edit: Later I found that this inverted version is faster:

final boolean overflow = ((x^y) | ~(x^sum)) >= 0;

It tests the same thing but uses one extra xor instead of a comparison. For the overflow case I found a 2% speed improvement, and it was about 4% faster in the non-overflow case.

For subtraction I did a similar thing:

final boolean overflow = (x^y) < 0 & (x^diff) < 0;

became:

final boolean overflow = ((x^y) & (x^diff)) < 0;

Here I didn't even have to add an extra xor.

Benchmarks

I took the suggestion of using JMH to run benchmarks. Here are the results:

Overflow add test
-----------------
Methodology: Add 0x4000000000000000L with itself modulo 0x12345678
             in a loop of 100000 iterations. 

Method                   Operations per second     Speed
------                   ---------------------     -----
Original code              238.754 +- 0.778        100.0%
Multiply change            241.527 +- 0.884        101.1%
Remove extra check         241.180 +- 1.007        101.0%
Change overflow check      243.335 +- 1.662        101.9%
All changes                256.452 +- 1.325        107.4%

Non-overflow add test
---------------------
Methodology: Add 0x4000000000000000L with 1 modulo 0x12345678
             in a loop of 100000 iterations. 

Method                   Operations per second     Speed
------                   ---------------------     -----
Original code              330.298 +- 1.635        100.0%
Change overflow check      342.496 +- 2.037        103.7%

Code Snippets

long result = 2L * mod(half) + lsb;

long halfmod = mod(half);
    long result = halfmod + halfmod + lsb;

// Now the result lies in the range 0 to 2*modulus+1 (both included),
    // so conditionally reduce it twice.
    if (result>=modulus) result -= modulus;
    if (result>=modulus) result -= modulus;

// Now the result lies in the range 0 to 2*modulus-1 (both included),
    // so conditionally reduce it once.
    if (result>=modulus) result -= modulus;

final boolean overflow = (x^y) >= 0 & (x^sum) < 0;

Context

StackExchange Code Review Q#95173, answer score: 3

Revisions (0)

No revisions yet.