patternMinor
Counter in FASM assembly
Viewed 0 times
fasmassemblycounter
Problem
I am making a simple counter in assembly. It counts to a billion and exits. However, the performance of this is really horrible. A Java application (that does the same) outperforms it by around 10x. How can I optimize this code?
include 'include/win32ax.inc'
section '.data' Data readable writeable
outhandle DD ?
inhandle DD ?
incr DD 0
numwritten DD ?
inchar DB ?
numread DD ?
section '.text' code readable executable
start:
invoke AllocConsole
invoke GetStdHandle,STD_OUTPUT_HANDLE
mov [outhandle],eax
invoke GetStdHandle,STD_INPUT_HANDLE
mov [inhandle],eax
invoke WriteConsole,[outhandle],"Starting...",11,numwritten,0
jmp loopcount
loopcount:
inc [incr]
cmp [incr], 1000000000;1 billion. ;]
jne loopcount
invoke WriteConsole,[outhandle]," Done count",13,numwritten,0
invoke ReadConsole,[inhandle],inchar,2,numread,0
invoke ExitProcess,0
.end startSolution
Load
Instead of
The advice above to use
Intel's Branch and Loop Reorganization to Prevent Mispredicts seems to recommend you code a loop like this ...
... in order to optimize the branch prediction.
Unrolling the loop might make it faster. In fact given how dumb the algorithm is, the best optimization is to immediately 1000000 to [incr] without looping.
[incr] into a register instead, for example eax, and increment the register.Instead of
cmp [incr], 1000000000 and jne, try loading 1000000 into the ecx register and using the loop opcode.The advice above to use
loop is probably obsolete. Instead it may be faster to do load 1000000 into a register like ebx, and do a dec ebx and jnz loop.Intel's Branch and Loop Reorganization to Prevent Mispredicts seems to recommend you code a loop like this ...
loop_top:
... do work inside the loop here ...
cmp ebx,1000000
jz loop_end ; happens rarely
inc ebx
jmp loop_top ; happens often
loop_end:... in order to optimize the branch prediction.
Unrolling the loop might make it faster. In fact given how dumb the algorithm is, the best optimization is to immediately 1000000 to [incr] without looping.
Code Snippets
loop_top:
... do work inside the loop here ...
cmp ebx,1000000
jz loop_end ; happens rarely
inc ebx
jmp loop_top ; happens often
loop_end:Context
StackExchange Code Review Q#41916, answer score: 5
Revisions (0)
No revisions yet.