HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

Dijkstra's algorithm for computing the GCD of two integers, translated from C to x86 assembly

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
fromthedijkstrax86translatedalgorithmtwoassemblyforgcd

Problem

This is the x86 version of a C file assignment (I have included both). We were to convert the C code to assembly. I am rather new at assembly and would really appreciate suggestions and help for optimization and functionality.

This is a C implementation of Dijkstra's recursive algorithm for computing the greatest common divisor of two integers.

```
#include
#include
#define STDIN 0
#define STDOUT 1

unsigned int getInt(char* string) {

unsigned int result = 0;
char* digit = string;
while (*digit != '\n') digit++; // Obtain the address of
digit--; // the last digit character
while (digit >= string) {
if (*digit == ' ') break;
if (*digit '9') {
char* errorMessage = "Bad Number.\n";
write(STDOUT,errorMessage,12);
exit(0);
}
// use the MUL (dword) instruction here (unsigned multiply)
// be careful to understand its operands and results
result += (digit - '0') digitValue;
digitValue *= 10;
digit--; // walk backwards from least
} // significant to most
return result;
}
void makeDecimal(unsigned int n) {
// use the DIV (dword) instruction here (unsigned divide)
// be careful to understand its operands and results
unsigned int remainder = n % 10;
unsigned int quotient = n / 10;
if (quotient > 0) makeDecimal(quotient); // notice recursion!
char digit = remainder + '0';
write(STDOUT,&digit,1);
}

int readNumber() {
char data[20];
char* prompt = "Enter a positive integer: ";
write(STDOUT,prompt,26);
read(STDIN,data,20);
return getInt(data);
}

unsigned int gcd(unsigned int n, unsigned int m) {
if (n > m) {
return gcd(n - m, m); // recursion
} else if (n < m) {
return gcd(n, m - n); // recursion
} else return n; // base case
}

int main() {
char newLineChar = '\n';

Solution

My thoughts (in no particular order):

  • Short on comments. Assembler in particular benefits from comments.



  • Uses a very conservative style (params passed on stack, all regs preserved, maintain stack frames). For max performance (often the reason to use asm vs high level languages), there are alternatives (pass params in regs, omit stack frame, assume certain regs clobbered during 'call') that use less memory and give better performance.



  • Maybe omit jl LESS, since that's the default.



  • What's with nop?



Edit: I'm expanding on #2 since it was so terse.

I'm probably harping on this too much, but I think it's an important point to understand about asm programming.

There are rules when programming a computer. Some are enforced by the CPU (mustn't divide by 0, no NULL references, etc). Some are defined by the assembler (accidentally typing mvo eax, 0 instead of mov eax, 0).

And some are defined by the high level languages (like C). If one routine written in C is going to call another routine written in C, they must agree on a set of rules. Where is the calling routine going to put the parameters so that the called routine can find them? What registers must the called routine preserve? All? Some? None? Where is the return value from the called routine going to be?

But whatever decisions C made are just that: Decisions made by the people who designed the C language. Fortran might use different rules. Pascal, java, COBOL might all do something different.

But asm, ahh...

If your asm code was going to call a C function (yes, that can be done), then you would have to follow the C rules about where to put the parameters and where the C routine will return its result. But when one asm routine calls another, it can use any of the high level language rules (making it flexible enough to work with any language), or use none of them and make up its own.

Which brings us back to your question about removing the stack as a way to pass parameters. The code below doesn't really follow any industry standard rules. But 'main' and 'gcd' are both written according to the same rules. And since they agree, the code works as intended. And the resulting code is (a bit) more efficient.

(I'm going to base this on JS1's code, but you can see where this happens in yours)

Let's start with the existing code. You read values from the stack:

mov     eax, [ebp+8]    ;var n
mov     edx, [ebp+12]   ;var m


But why do you read it from the stack? Well, the reason you read it from the stack is that when you call the function, that's where you put the values:

push    eax
push    edx
call    gcd


But why do you push the values before you call the function? Well, you do it because that's where the function is going to read them. In other words, we do it because we need to, and we need to because we do it. It's just a decision you have made about how you will pass the parameters.

So what's the alternative?

What if you just change the rule so that instead of pushing the parameters before you made the call, you just always made sure that eax and edx contain the values you want before you make the call? Then, instead of loading eax and edx from the stack, they are already there.

It's easy to forget and think of registers like variables that go out of scope when you call a function. But that's not how registers work. There is only 1 edx.

I haven't run this (I'm not on Linux), but how about something like this:

gcd:
    cmp     eax, edx
    jg      GREATER
    je      FINISH

    sub     edx, eax ; LESS case
    jmp     RECURSE
GREATER:
    sub     eax, edx
RECURSE:
    call    gcd      ; args popped off by FINISH

FINISH:              ; return value must be in eax
    ret


Since we are passing the parameters in registers, the push statements before the call to gcd are no longer needed. And since we aren't following cdecl anymore, I also got rid of the stack frame stuff (the ebp stuff).

So what happens: We assume that eax and edx are set before the routine starts (more on that below). We compare them, subtract as necessary and (possibly) call gcd again. And if the rule is that we must have the values in eax and edx before we call gcd, why, they're already there!

So, we do a lot less pushing/popping, have fewer instructions, use less memory, I'm pretty sure this is even gluten free.

(Q: As a thought exercise: what happens if you change the "call gcd" by RECURSE to "jmp gcd"? See answer below.)

We still need to change the code in main that calls gcd for the first time. It needs to make sure the parameters are where the new rule says they are:

call    numRead
mov     edx, eax        
call    numRead
; value is already in eax
call    gcd


And here is why having rules is so important.

I'm moving the return value from the first numRead call into edx. Then I call numRead again. If numRead treated edx as volatile (the way gcd does), my first number would get overwritten

Code Snippets

mov     eax, [ebp+8]    ;var n
mov     edx, [ebp+12]   ;var m
push    eax
push    edx
call    gcd
gcd:
    cmp     eax, edx
    jg      GREATER
    je      FINISH

    sub     edx, eax ; LESS case
    jmp     RECURSE
GREATER:
    sub     eax, edx
RECURSE:
    call    gcd      ; args popped off by FINISH

FINISH:              ; return value must be in eax
    ret
call    numRead
mov     edx, eax        
call    numRead
; value is already in eax
call    gcd

Context

StackExchange Code Review Q#133347, answer score: 8

Revisions (0)

No revisions yet.