patterncCritical

Disproving Euler proposition by brute force in C

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

time-limit-exceeded codereview c stackoverflow beginner mathematics multithreading

forcedisprovingeulerbruteproposition

Problem

I wrote a little bit of code to brute force disprove the Euler proposition that states:

$$a^4 + b^4 + c^4 = d^4$$

has no solution when \$a\$, \$b\$, \$c\$, \$d\$ are positive integers.

I'm no mathematician, but reading around this, at least one solution was found by Noam Elkies in 1987 (a = 95800; b = 217519; c = 414560; d = 422481). I wanted to get an idea of how much firepower it would take to solve by brute force, so I wrote the following in C:

#include 
#include 
#include 

int prop(long int A, long int B, long int C, long int D) {
    return (pow(A, 4) + pow(B, 4) + pow(C, 4) == pow(D, 4));
}

int main() {
    long int a, b, c, d;
    clock_t t;
    t = clock();

    for (a = 1; a < 100000; a++) {
        for (b = 1; b < 300000; b++) {
            for (c = 1; c < 500000; c++) {
                for (d = 1; d < 500000; d++) {
                    if (prop(a, b, c, d))
                        printf("FOUND IT!\na = %ld\nb = %ld\nc = %ld\nd = %ld\n", a, b, c, d);
                }
                if (!(c%1000))
                    printf("a = %ld, b = %ld, c = %ld, time = %fs\n", a, b, c, ((double)(clock() - t))/CLOCKS_PER_SEC);
            }
            printf("a = %ld, b = %ld, time = %fs\n", a, b, ((double)(clock() - t))/CLOCKS_PER_SEC);
        }
        printf("a = %ld, time = %fs\n", a, ((double)(clock() - t))/CLOCKS_PER_SEC);
    }

}

I ran it for a while, and worked out that it would take roughly \$85 \times 10^6\$ years for it to get to the answer above on my current machine.

I mean, maybe if I waited a few thousand years, I'd have a slightly better machine, but my question is (and I am new to C and computer science in general - so please be gentle); what strategies could I take to make the above code run faster? I thought about threading, and using some sort of bit shifting in place of the pow() calls.

Would there be any way (on my current machine) to get this to run in my lifetime?

Solution

Note: At some point, this review drifted into the realm of assembler and GMP. An actual review is at the end of this post, whereas the first section discusses the runtime-problems concerning pow, wrong data types and arbitrary large integers.

No life time for run time

Would there be any way (on my current machine) to get this to run in my lifetime?

There's a great saying in carpentry: measure twice, cut once. It concerns cutting wood or other material, where you have to throw away your resources if you accidentally cut at the wrong place.

A similar saying is there for software engineers: you can't optimize what you can't measure. There are several ways to measure your code, e.g. benchmarking, profiling, or looking at the generated assembler to see how many instructions a certain part of your code will take.

Here, we will take the latter route, start with the assembler, take considerations step by step and see where we end up.

A study in assembly

Lets have a look at your code. Well, not yours, but the assembler the compiler generates. You can use gcc -S -O3. On my platform, this results in the following "hot" section in main:

.L6:
        add     rbx, 1
        cmp     rbx, 500000
        je      .L18
.L8:
        mov     rax, QWORD PTR .LC0[rip]
        movsd   xmm0, QWORD PTR [rsp+40]
        movq    xmm1, rax
        call    pow                        ; (1)
        mov     rax, QWORD PTR .LC0[rip]
        movsd   QWORD PTR [rsp+8], xmm0
        movsd   xmm0, QWORD PTR [rsp+48]
        movq    xmm1, rax
        call    pow                        ; (2)
        mov     rax, QWORD PTR .LC0[rip]
        movsd   QWORD PTR [rsp+16], xmm0
        movsd   xmm0, QWORD PTR [rsp+32]
        movq    xmm1, rax
        call    pow                        ; (3)
        mov     rax, QWORD PTR .LC0[rip]
        movsd   QWORD PTR [rsp+24], xmm0
        pxor    xmm0, xmm0
        cvtsi2sd        xmm0, rbx
        movq    xmm1, rax
        call    pow                        ; (4)
        movsd   xmm2, QWORD PTR [rsp+8]
        addsd   xmm2, QWORD PTR [rsp+16]
        movapd  xmm1, xmm0
        movsd   xmm0, QWORD PTR [rsp+24]
        addsd   xmm0, xmm2
        ucomisd xmm0, xmm1
        jp      .L6
        jne     .L6

Even though you might not know assembler, you can see those four calls to pow. The first thing you need to know is that call is slow compared to those other operations. Those four calls happen in the innermost loop. The compiler removed the call to prop and instead replaced it by its code (that's faster).

mov assigns values to registers, add adds values, and so on. The registers with xmm* are double precision registers, meant for double variables. So we're basically calling pow with the right values and then add, subtract and modify our small little double values.

Double trouble

But wait a second. We're trying to solve a completely integral problem! Why does our generated program use those registers at all?

This should raise a red flag. And indeed, if we remember pow's signature, it should be clear that it's not the right tool. It takes a double base and exponent, which indicates that it's suitable for terms like \$15.12151^{3.1415926}\$. This is a total overkill for your problem.

Using proper functions

So let's use another pow version instead:

long int pow4(long int x){
    return x * x * x * x;
}

Note that your compiler should create something like this from that:

movq    %rdi, %rax
imulq   %rdi, %rax
imulq   %rax, %rax
ret

but if your compiler doesn't recognize this potential (micro) optimization, you can use

long int pow4(long int x){
    const long int t = x * x;
    return t * t;
}

instead.

We also need to change prop:

int prop(long int A, long int B, long int C, long int D) {
  return (pow4(A) + pow4(B) + pow4(C) == pow4(D));
}

Allright. Now, before I show the times of the new program, let's check the output of your old one:

a = 1, b = 1, c = 1000, time = 114.156000s

That's when I hit ^C. How does the one using pow4 hold up?

a = 1, b = 1, c = 1000, time = 0.296000s
a = 1, b = 1, c = 2000, time = 0.578000s
a = 1, b = 1, c = 3000, time = 0.859000s
a = 1, b = 1, c = 4000, time = 1.140000s
a = 1, b = 1, c = 5000, time = 1.421000s
a = 1, b = 1, c = 6000, time = 1.703000s
a = 1, b = 1, c = 7000, time = 1.984000s
a = 1, b = 1, c = 8000, time = 2.265000s
a = 1, b = 1, c = 9000, time = 2.546000s
a = 1, b = 1, c = 10000, time = 2.828000s
a = 1, b = 1, c = 11000, time = 3.109000s
a = 1, b = 1, c = 12000, time = 3.390000s
a = 1, b = 1, c = 13000, time = 3.687000s
a = 1, b = 1, c = 14000, time = 3.968000s
a = 1, b = 1, c = 15000, time = 4.250000s
a = 1, b = 1, c = 16000, time = 4.531000s

Which is 0,2% of your original time, or a 500x speedup.

However, this comes at a cost: pow4(500000) is too large for a int64_t, since \$\log_2(500000^4) \approx 76\$. The greatest number you could check with a uint64_t is

Code Snippets

.L6:
        add     rbx, 1
        cmp     rbx, 500000
        je      .L18
.L8:
        mov     rax, QWORD PTR .LC0[rip]
        movsd   xmm0, QWORD PTR [rsp+40]
        movq    xmm1, rax
        call    pow                        ; (1)
        mov     rax, QWORD PTR .LC0[rip]
        movsd   QWORD PTR [rsp+8], xmm0
        movsd   xmm0, QWORD PTR [rsp+48]
        movq    xmm1, rax
        call    pow                        ; (2)
        mov     rax, QWORD PTR .LC0[rip]
        movsd   QWORD PTR [rsp+16], xmm0
        movsd   xmm0, QWORD PTR [rsp+32]
        movq    xmm1, rax
        call    pow                        ; (3)
        mov     rax, QWORD PTR .LC0[rip]
        movsd   QWORD PTR [rsp+24], xmm0
        pxor    xmm0, xmm0
        cvtsi2sd        xmm0, rbx
        movq    xmm1, rax
        call    pow                        ; (4)
        movsd   xmm2, QWORD PTR [rsp+8]
        addsd   xmm2, QWORD PTR [rsp+16]
        movapd  xmm1, xmm0
        movsd   xmm0, QWORD PTR [rsp+24]
        addsd   xmm0, xmm2
        ucomisd xmm0, xmm1
        jp      .L6
        jne     .L6

long int pow4(long int x){
    return x * x * x * x;
}

movq    %rdi, %rax
imulq   %rdi, %rax
imulq   %rax, %rax
ret

long int pow4(long int x){
    const long int t = x * x;
    return t * t;
}

int prop(long int A, long int B, long int C, long int D) {
  return (pow4(A) + pow4(B) + pow4(C) == pow4(D));
}

Context

StackExchange Code Review Q#145221, answer score: 151

Revisions (0)

No revisions yet.