patternjavaMinor

Project Euler problem 530 - GCD function is inefficient

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

problemfunctioninefficientprojecteuler530gcd

Problem

Here is a description of Project Euler problem 530:

Every divisor \$d\$ of a number \$n\$ has a complementary divisor \$n/d\$.

Let \$f(n)\$ be the sum of the greatest common divisor of \$d\$ and \$n/d\$ over all positive divisors \$d\$ of \$n\$, that is \$\displaystyle f(n)=\sum_{d/n}\gcd\left(d, \frac{n}{d}\right)\$.

Let \$F\$ be the summatory function of \$f\$, that is \$\displaystyle F(k) = \sum_{n=1}^{k}f(n)\$.

You are given that \$F(10)=32\$ and \$F(1000)=12776\$.

Find \$F(10^{15})\$.

Yesterday, I asked a question here: Project Euler Problem 530: sum of sum of greatest common divisors

I have improved my solution, which is now as follows:

import java.math.BigInteger;

public class problem530 {

public static void main(String[] args) {

    double start = System.nanoTime();

    BigInteger result = BigInteger.ZERO;
    long target = 100000;

    for (long i = 1; i*i <=  target; i++){
                for(long j = i; i*j <= target; j++) {
                    if (i != j) {
                    result = result.add(BigInteger.valueOf((GCD(i, j))*2));
                }
                    else {
                        result = result.add(BigInteger.valueOf(i));
                    }
                }
            }

    System.out.println(result);

    double duration = (System.nanoTime() - start)/1000000000;

    System.out.println("Your code took " + duration + " seconds to execute.");

}

public static long GCD (long p, long q) {
    if (p == 0) return q;
    else return GCD (q%p,p);
}

This solution uses a simplified version of the summation (it avoids factorisation).

However, the GCD function I'm using is computationally expensive and causes the code to be incapable of finding F(x) for large values of x.

Currently:

It ran F(10^1) = 32 in 0.0027 seconds.

It ran F(10^3) = 12766 in 0.0042 seconds.

It ran F(10^5) = 2907546 in 0.091 seconds.

It ran F(10^7) = 518387613 in 8.034 seconds.

It ran F(10^8) = 6563728768

Solution

Okay, I've taken another look at the problem, and I have another refinement. It's not a complete solution, though; I suspect that Project Euler 530 is expecting you to have some number-theoretical insight, rather than just a lot of refinements on top of brute force.

The brute-force solution is indeed something like this (in C++ because I like C++ better than Java):

template
T f(T n)
{
    T sum = 0;
    const T sqrtn = sqrt(n);
    for (T i = 1; i 
T F(T k)
{
    T sum = 0;
    for (T i = 1; i <= k; ++i) {
        sum += f(i);
    }
    return sum;
}

By the way, gcd itself is going to be very fast no matter what, as long as you're compiling with optimizations turned on. For my C++ testing, I just used #define gcd std::__gcd, which turned out to be just a hair faster than the naive %-based algorithm and much faster than a "Binary GCD" algorithm I copied from Wikipedia.

This version performs F(1e6) in 7.7 seconds.

The answer at Optimize finding GCD of all pairs of divisors provides a very good algorithmic refinement. It reverses the loops so that you have something like this:

template
T F(T k)
{
    T sum = 0;
    const T sqrtk = sqrt(k);
    for (T d = 1; d <= sqrtk; ++d) {
        const T k_over_d = k/d;
        sum += d;
        for (T n = d+1; n <= k_over_d; ++n) {
            const T g = gcd(n, d);
            sum += 2*g;
        }
    }
    return sum;
}

This is basically what you've implemented, except that you compute i*i sqrt(n) times instead of just computing sqrt(n) once, and so on. In other words, you're deliberately pessimizing your code, which is one reason it's hard to identify the real bottlenecks. When doing Project Euler–type speed challenges, you should always eliminate all the inessential inefficiencies that you can, so that if any inefficiencies remain, they'll be essential to your algorithm (and hopefully point the way to a better algorithm).

This version performs F(1e7) in 4.0 seconds.

So, looking at that version, the thing that jumped out at me was that we're computing gcd(n,d) in the inner loop... over a lot of values of n, like, more than d different values of n. And gcd(n,d) is never bigger than d. So we're summing up the same value of gcd(n,d) multiple times (for n equal to n1, and n1+d, and n1+d+d, and so on). So, I thought to myself, let's split that inner loop into d inner loops and compute 1/dth as many GCDs!

template
T F(T k)
{
    T sum = 0;
    const T sqrtk = sqrt(k);
    for (T d = 1; d <= sqrtk; ++d) {
        const T k_over_d = k/d;
        for (T n_mod_d = 0; n_mod_d < d; ++n_mod_d) {
            const T g = gcd(n_mod_d, d);
            for (T n = d + n_mod_d; n <= k_over_d; n += d) {
                sum += 2*g;
            }
        }
        sum -= d;
    }
    return sum;
}

This version performs F(1e8) in 6.5 seconds.

I thought Clang (my C++ compiler of choice) was probably smart enough to optimize that inner-inner loop (the one that just adds 2*g a known number of times). Indeed, manually optimizing it to

const T g = gcd(n_mod_d, d);
            const T iterations = (k_over_d - n_mod_d) / d;
            sum += 2*g*iterations;

didn't produce any further speedup, indicating that the compiler was already doing that optimization for me.

We can rearrange some of the obvious commutative operations, e.g. instead of subtracting d from sum for each d from 1 to sqrtk, we can just subtract (sqrtk+1) * sqrtk / 2 once... that kind of thing. That doesn't change the shape of the nested loops, so it doesn't affect the runtime of the code; but it makes the code a bit shorter and removes some of the inessential inefficiency so that we can see more clearly the essential inefficiency we're still trying to get rid of.

template
T F(T k)
{
    T sum = 0;
    const T sqrtk = sqrt(k);
    for (T d = 1; d <= sqrtk; ++d) {
        const T k_over_d = k/d;
        for (T i = 0; i < d; ++i) {
            sum += gcd(i, d) * ((k_over_d - i) / d);
        }
    }
    return (2 * sum) - ((sqrtk + 1) * sqrtk / 2);
}

Aha! Given that i < d, that subexpression ((k_over_d - i) / d) looks suspicious. It can only ever take on one of two values: either int(k_over_d / d) if i <= k_over_d % d, or that-minus-1 otherwise. So we hoist that multiplication out of the loop, and finally we're back to a couple of loops over gcds.

I got another tiny speedup by using a lookup table for gcd(x,y) where x <= y <= 8192; that version performed F(1e8) in 5.0 seconds.

However, you don't need 10x speedups; you need 10000000x speedups, in order to perform F(1e15) in a reasonable amount of time. This implies that there's another algorithmic, number-theoretic, approach that will be the actual solution Project Euler is looking for.

You might think that another possible solution would be to split up the work and use multithreading to get the answer in parallel. But even supposing that you have a 16-c

Code Snippets

template<typename T>
T f(T n)
{
    T sum = 0;
    const T sqrtn = sqrt(n);
    for (T i = 1; i <= sqrtn; ++i) {
        if (n % i) continue;
        sum += gcd(i, n/i);
        if (i*i != n) sum += gcd(i, n/i);
    }
    return sum;
}

template<typename T>
T F(T k)
{
    T sum = 0;
    for (T i = 1; i <= k; ++i) {
        sum += f(i);
    }
    return sum;
}

template<typename T>
T F(T k)
{
    T sum = 0;
    const T sqrtk = sqrt(k);
    for (T d = 1; d <= sqrtk; ++d) {
        const T k_over_d = k/d;
        sum += d;
        for (T n = d+1; n <= k_over_d; ++n) {
            const T g = gcd(n, d);
            sum += 2*g;
        }
    }
    return sum;
}

template<typename T>
T F(T k)
{
    T sum = 0;
    const T sqrtk = sqrt(k);
    for (T d = 1; d <= sqrtk; ++d) {
        const T k_over_d = k/d;
        for (T n_mod_d = 0; n_mod_d < d; ++n_mod_d) {
            const T g = gcd(n_mod_d, d);
            for (T n = d + n_mod_d; n <= k_over_d; n += d) {
                sum += 2*g;
            }
        }
        sum -= d;
    }
    return sum;
}

const T g = gcd(n_mod_d, d);
            const T iterations = (k_over_d - n_mod_d) / d;
            sum += 2*g*iterations;

template<typename T>
T F(T k)
{
    T sum = 0;
    const T sqrtk = sqrt(k);
    for (T d = 1; d <= sqrtk; ++d) {
        const T k_over_d = k/d;
        for (T i = 0; i < d; ++i) {
            sum += gcd(i, d) * ((k_over_d - i) / d);
        }
    }
    return (2 * sum) - ((sqrtk + 1) * sqrtk / 2);
}

Context

StackExchange Code Review Q#115026, answer score: 5

Revisions (0)

No revisions yet.