HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

String length calculation implementation in Assembly (NASM)

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
calculationlengthassemblyimplementationstringnasm

Problem

I just made a working implementation of calculating a string's length, but I'm not exactly sure it's the most efficient way. I divide the string up in blocks of four bytes each (32 bits), and add 4 to EAX. Once I hit a completely NULL block, I roll back the count of EAX by 4 bytes, and then calculate the final block's size one byte at a time. It only works using ASCII characters, and is limited to 32 bit computing (though that can be easily upgraded to 64 bit soon). I'm using assembly on a Linux maching using NASM. Here's the code:

`; Function to compute the length of a string in blocks
; this only works in ASCII, or else it may give the wrong character count.
; input register: EAX - the string

section .data

BYTE_COUNT equ 4 ;4 bytes = 32 bits
NULL_TERMINATOR equ 0 ;\0, aka 0

section .text

global _strlen ;main entry point

_strlen:

push ebp ; c calling convention: preserve ebp
mov ebx, eax ; mov to ebx the string
jmp _NextBlock ; go to _NextChar

_NextBlock:
cmp word [eax], NULL_TERMINATOR ; Compare whether or not this block is completely null (e.g. 0)
jz _RollbackFinalBlock ; if it is, jump to _RollbackFinalBlock
; if not...
add eax, BYTE_COUNT ; Add to EAX the block size
jmp _NextBlock; repeat loop

_RollbackFinalBlock:

sub eax, BYTE_COUNT ;Subtract the block size from EAX, because it is null
jmp _FinalizeFinalBlock ; go to _FinalizeFinalBlock

_FinalizeFinalBlock:
cmp byte[eax], NULL_TERMINATOR ;Compare the characters of the final block to the null byte.
jz _Exit ;if this byte is null, proceed to _Exit
; if not...
inc eax ;increment EAX
jmp _FinalizeFinalBlock ;repeat loop

_Exit:

dec eax ; We will have a null terminator at the end. remove it.

Solution

I have a number of comments that I hope you find helpful.

Code comments

First, your code is generally well commented and I had little difficulty in understanding what the code was doing or why. That's a good thing. A few minor points, though. First, your comment for the overall function says

; input register: EAX - the string


however, it's probably useful to point out that EAX is actually a pointer to the string and isn't intended to contain a whole string.

Also, comments should tell something not obvious about the code. So this:

jmp     _NextBlock; repeat loop


is arguably OK, but this:

inc eax ;increment EAX


is not really helpful. Instead you should say why the code is incrementing EAX rather than simply repeating what the instruction does.

Structure

Overall, the structure was OK, but the line

jmp _NextBlock ; go to _NextChar


near the top of the function is not needed, since _NextBlock is the next instruction anyway. The same is true of the jmp just before _FinalizeFinalBlock. Generally speaking, you should strive to eliminate unconditional jumps. So for example, your code includes this:

_FinalizeFinalBlock:
        cmp     byte[eax], NULL_TERMINATOR ;Compare the characters of the final block to the null byte.
        jz      _Exit ;if this byte is null, proceed to _Exit
        ; if not...
        inc eax ;increment EAX
        jmp     _FinalizeFinalBlock ;repeat loop

    _Exit:


But you could easily restructure that to eliminate the unconditional jmp like this:

dec eax ;   
    _FinalizeFinalBlock:
        inc eax ;increment EAX
        cmp     byte[eax], NULL_TERMINATOR ;Compare the characters of the final block to the null byte.
        jnz     _FinalizeFinalBlock ; keep going until final block is NULL

    _Exit:


Algorithm

The most significant problem I see with the code is the algorithm that relies on strings having multiple trailing NUL bytes. This may be true for code that calls this function, but it's not generally true of C strings which typically end with only a single NUL character.

Instruction set use

The repnz scasb instruction is probably going to be a lot more efficient that the loop you've constructed. I'd recommend reading about it and then incorporating that into your code.

Register use

The ebx register is destroyed and not saved by the function. This may be fine for your purposes, but it should be documented in comments at the top of the function.

Also, the code doesn't actually alter ebp so there isn't any point to pushing it onto the stack and popping it off at the end of the routine. (Maybe you meant ebx?)

Update:

On my machine (quad core running 3.2GHz under 64-bit Linux) I found this version to be slightly (15%) faster than the corresponding repnz scasb version:

strlen2:
    mov edx, [esp + 4]
    xor eax,eax ; count = 0
    jmp overit
looptop:
    inc edx
    inc eax
overit:
    cmp byte[edx],0
    jnz looptop
    ret

Code Snippets

; input register: EAX - the string
jmp     _NextBlock; repeat loop
inc eax ;increment EAX
jmp _NextBlock ; go to _NextChar
_FinalizeFinalBlock:
        cmp     byte[eax], NULL_TERMINATOR ;Compare the characters of the final block to the null byte.
        jz      _Exit ;if this byte is null, proceed to _Exit
        ; if not...
        inc eax ;increment EAX
        jmp     _FinalizeFinalBlock ;repeat loop

    _Exit:

Context

StackExchange Code Review Q#48034, answer score: 6

Revisions (0)

No revisions yet.