HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Counter in FASM assembly

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
fasmassemblycounter

Problem

I am making a simple counter in assembly. It counts to a billion and exits. However, the performance of this is really horrible. A Java application (that does the same) outperforms it by around 10x. How can I optimize this code?

include 'include/win32ax.inc'

section '.data' Data readable writeable
  outhandle  DD ?
    inhandle   DD ?
    incr DD 0
    numwritten DD ?
    inchar     DB ?
    numread DD ?
section '.text' code readable executable 
  start:
     invoke  AllocConsole
      invoke  GetStdHandle,STD_OUTPUT_HANDLE
         mov [outhandle],eax
       invoke  GetStdHandle,STD_INPUT_HANDLE
        mov [inhandle],eax
        invoke  WriteConsole,[outhandle],"Starting...",11,numwritten,0

        jmp loopcount
       loopcount:
       inc [incr]
       cmp [incr], 1000000000;1 billion. ;]
       jne loopcount
       invoke  WriteConsole,[outhandle],"   Done count",13,numwritten,0
       invoke  ReadConsole,[inhandle],inchar,2,numread,0
       invoke ExitProcess,0
.end start

Solution

Load [incr] into a register instead, for example eax, and increment the register.

Instead of cmp [incr], 1000000000 and jne, try loading 1000000 into the ecx register and using the loop opcode.

The advice above to use loop is probably obsolete. Instead it may be faster to do load 1000000 into a register like ebx, and do a dec ebx and jnz loop.

Intel's Branch and Loop Reorganization to Prevent Mispredicts seems to recommend you code a loop like this ...

loop_top:
... do work inside the loop here ...
cmp ebx,1000000
jz loop_end ; happens rarely
inc ebx
jmp loop_top ; happens often
loop_end:


... in order to optimize the branch prediction.

Unrolling the loop might make it faster. In fact given how dumb the algorithm is, the best optimization is to immediately 1000000 to [incr] without looping.

Code Snippets

loop_top:
... do work inside the loop here ...
cmp ebx,1000000
jz loop_end ; happens rarely
inc ebx
jmp loop_top ; happens often
loop_end:

Context

StackExchange Code Review Q#41916, answer score: 5

Revisions (0)

No revisions yet.