HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

x86 strcpy implementation

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
implementationx86strcpy

Problem

I got about 4 days of assembly knowledge, so I need a review on this strcpy function and if it can be done better (at least I have the feeling).

Full code (with the test included):

.data
    s:
        .asciz "Hello world!"

.bss
    .lcomm destination, 4

.text
.globl main
main:
    nop

    pushl $s
    pushl $destination
    call __strcpy
    addl $8, %esp

    pushl $destination
    call puts
    addl $4, %esp

    ret

.globl __strcpy
.type __strcpy, @function
__strcpy:
    movl $0xFFFF, %ecx
    movl 4(%esp), %edi
    movl 8(%esp), %esi

cpy:
    cmpl $0, (%esi)
    je done

    movsb
    loop cpy

done:
    ret


GitHub

Parts that I feel can be optimized:

-
Because the done label just executes the ret instruction:

  • cmpl $0, (%esi)



  • je done



-
Because the rep instruction-family seems a like better approach:

  • movsb



  • loop cpy

Solution

You can use Bit Twidding Hack to determine if int32 or int64 has no zero bytes: http://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord

If it has not you can copy whole int32 or int64. So it will be 4 operations for searching zero byte in 8 bytes (in int64 case). It looks like true optimization.

char * strcpy(char * dst, const char * src)
{
    char * origin = dst;

    while (!((((*(uint64_t *)src) - 0x0101010101010101ULL) 
           & ~(*(uint64_t *)src) & 0x8080808080808080ULL)))
    {
        *(uint64_t *)dst = *(uint64_t *)src;
        src += 8;
        dst += 8;
    }

    while (*dst++ = *src++)
       ;

    return origin;
}


Simple strcpy implementation uses 8 compares to zero and 8 byte copyings for each 8 bytes of source string. My implementation uses 4 operation for checking for zeros and 1 operation to copy for 8 bytes. So we have 5ops vs 16ops. Not all ops have same speeds so it is not easy to compare real speedup. We need some benchmarks, is anyone free?

Code Snippets

char * strcpy(char * dst, const char * src)
{
    char * origin = dst;

    while (!((((*(uint64_t *)src) - 0x0101010101010101ULL) 
           & ~(*(uint64_t *)src) & 0x8080808080808080ULL)))
    {
        *(uint64_t *)dst = *(uint64_t *)src;
        src += 8;
        dst += 8;
    }

    while (*dst++ = *src++)
       ;

    return origin;
}

Context

StackExchange Code Review Q#30337, answer score: 9

Revisions (0)

No revisions yet.