HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

x86 assembly method that mimics strcat

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
strcatmethodmimicsx86assemblythat

Problem

I'm trying to learn x86 assembly and hopefully move on to projects of the scope that more than one person would be involved.

This is a (relatively) simple subroutine written in Intel x86 assembly using MASM that copies one string onto the end of another. I named it strglue to avoid conflict with the C library function strcat (since the driver for this method is a C program) and the MASM macro CatStr.

I'm wondering if I can have some insight on the quality of:

  • The code itself (optimality of the instructions and register usage)



  • The documentation



  • The indentation of the loops and labels



.386
.MODEL FLAT, C
.DATA
.CODE

    ; Glue together (concatenate) two strings
    ; PARAMS:
    ;  [ESP+4] - Pointer to first character in null-terminated destination string
    ;  [ESP+8] - Pointer to first character in null-terminated source string
    ; USES:
    ;  EAX: Holds the destination string during execution
    ;  ECX: Holds the source string during execution:
    ;  EDX: Lower 8 bits (DL) holds characters being copied from source to destination
    ; OUTPUT:
    ;  [ESP+4] - Pointer to first character in null-terminated concatenation of destination string + source string (excluding the null-terminator from the destination string)
    strglue PROC

        ; Step 1: Get a pointer to the null-terminator character in the destination string
        MOV EAX, DWORD PTR [ESP+4]
        ENDOFDEST:  
            INC EAX
            CMP BYTE PTR [EAX], 0
        JNZ ENDOFDEST

        ; Step 2: Copy the source string into the pointer to the end of dest, overwriting the null-terminator
        MOV ECX, DWORD PTR [ESP+8]
        STRCATLOOP:
            MOV DL, BYTE PTR [ECX]
            LEA EAX, [EAX+1]
            MOV BYTE PTR [EAX-1], DL
            LEA ECX, [ECX+1]

            ; If ZF=0  the null terminator in the source string has been reached; stop looping
            TEST DL, DL
        JNZ STRCATLOOP
        RET
    strglue ENDP
END

Solution

This strikes me as looking a little like code I'd expect to see from a compiler--code that works, but completely ignores both traditional x86 register usage, and the special instructions the x86 provides for the job.

Traditionally, you'd use edi to point to your destination string, and esi to point to your source string. You'd use a repnz scasb to find the end of the first string, and lodsb/stosb to copy bytes from the source to the destination (there is actually a movsb to copy bytes as well, but it doesn't allow you to check for the end of the string as you copy).

Note that this depends on the direction flag being 0, but every compiler of which I'm aware leaves it that way (and depend on your leaving it that way as well).

xor eax, eax     ; set al to value we're looking for, 0 in this case.
    lea ecx, [eax-1] ; set ecx to 0ffffffffh

    mov edi, [esp+4] ; find end of dest string
    repnz scasb

    mov esi, [esp+8] ; we've found the end of the destination, now do the copy
    dec edi

copy_loop:
    lodsb
    stosb
    test al, al
    jnz  copy_loop
    ret


This is somewhat shorter. Under normal circumstances, we can expect either one to be memory bound, so speed probably isn't particularly relevant (but if we did care, it would depend on the exact processor in use--this would tend to work better on older processors, but would be about the same to possibly even a little slower on newer processors).

As far are readability goes, either one is fairly easy to read, but I think my code fits more closely with how somebody with experience writing code for an x86 would expect to see the registers used. As far as indentation goes, there doesn't seem to be much agreement about whether to indent assembly language. Your indentation certainly doesn't hurt anything, but it isn't like higher level languages where it's expected to the point that unindented code is clearly a problem. The way I've written the code above (instructions indented one stop, label(s) at the left margin) is almost certainly more common.

I should probably add that I'm a bit hesitant to post on this subject--while I think I wrote assembly language quite well at one time, it's been years since I wrote enough to notice. I do look forward to @Edward posting a review on this (though I'm not sure whether he writes much assembly code any more either).

Code Snippets

xor eax, eax     ; set al to value we're looking for, 0 in this case.
    lea ecx, [eax-1] ; set ecx to 0ffffffffh

    mov edi, [esp+4] ; find end of dest string
    repnz scasb

    mov esi, [esp+8] ; we've found the end of the destination, now do the copy
    dec edi

copy_loop:
    lodsb
    stosb
    test al, al
    jnz  copy_loop
    ret

Context

StackExchange Code Review Q#126495, answer score: 5

Revisions (0)

No revisions yet.