patternMinor

Integer square root in x86 assembly (NASM)

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

x86assemblyrootsquareintegernasm

Problem

This program calculates the square root of an unsigned 32-bit number with some bit fiddling to get a very close approximation, then one iteration of the Babylonian method to get an exact value. I don't know if this could be faster, but hope it can.

I have trouble with clearly commenting my code, so I will gladly explain unclear portions of code.

[global sqrti]
sqrti:
        mov     eax, [esp+4]
        xor     ecx, ecx       ; clear register to put our approximation in
        ; make the lower 16 bits of CX equal the even-numbered bits of EAX
        shr     eax, 3
        rcr     cx, 1
%rep 14
        shr     eax, 2
        rcr     cx, 1
%endrep
        ; do one iteration of the Babylonian method
        or      ecx, 1          ; prevent a division by zero
        mov     eax, [esp+4]
        xor     edx, edx        ; clear top half of DIV's implied operand
        div     ecx
        add     eax, ecx
        shr     eax, 1
        ret

Solution

The first thing which strikes me as odd is how you calculate your initial estimation. If you only take every 2nd bit into account that means a value like 0b10101010101010 would yield a 0 as your initial estimation (or 1 after the or ecx, 1 adjustment) - not very good. Also it's not clear to me why you skip the first 3 bits, that makes even your best estimates off by a factor of 2.

But I can see what you try to do: just take the "half" of the bits of the initial value. I think you can fix that by first or-ing the even with the uneven bits together and then apply the shifting and roring:

mov     eax, [esp+4]
        test    eax, eax,
        jnz     positive
        ret              ; Return immediately when input is 0
positive:
        mov     ecx, eax
        shl     ecx
        or      eax, ecx
        bsr     ecx, eax ; Get highest bit for later adjustment
        xor     edx, edx
loop:
        shr     eax, 2
        rcr     dx
        jnz     loop
        shr     cl       ; Adjust for skipped bits
        inc     cl
        rol     dx, cl
        mov     ecx, edx ; Use ecx to store estimation

Note that shr modifies the zero flag, but rcr doesn't so you can jump out of the loop as soon as eax turns out to be 0.

Furthermore note that 1 is an implied parameter for shifting operations if none is given. So you can safe a byte each time you leave it out (and possibly a cycle).

The other thing which is strange is that you only do one iteration. You should repeat it till your estimated value varies only by 0 or 1 between iterations, otherwise it's hard to see how your sqrti function will be useful.

And I don't think you need the or ecx, 1 statement. Just make sure to return immediately if [esp+4] turns out to be 0. And if its positive your ecx register should never become 0 if you get the initial estimation right.

Code Snippets

mov     eax, [esp+4]
        test    eax, eax,
        jnz     positive
        ret              ; Return immediately when input is 0
positive:
        mov     ecx, eax
        shl     ecx
        or      eax, ecx
        bsr     ecx, eax ; Get highest bit for later adjustment
        xor     edx, edx
loop:
        shr     eax, 2
        rcr     dx
        jnz     loop
        shr     cl       ; Adjust for skipped bits
        inc     cl
        rol     dx, cl
        mov     ecx, edx ; Use ecx to store estimation

Context

StackExchange Code Review Q#48847, answer score: 7

Revisions (0)

No revisions yet.