patternMinor
x86 strcpy implementation
Viewed 0 times
implementationx86strcpy
Problem
I got about 4 days of assembly knowledge, so I need a review on this
Full code (with the test included):
GitHub
Parts that I feel can be optimized:
-
Because the
-
Because the
strcpy function and if it can be done better (at least I have the feeling).Full code (with the test included):
.data
s:
.asciz "Hello world!"
.bss
.lcomm destination, 4
.text
.globl main
main:
nop
pushl $s
pushl $destination
call __strcpy
addl $8, %esp
pushl $destination
call puts
addl $4, %esp
ret
.globl __strcpy
.type __strcpy, @function
__strcpy:
movl $0xFFFF, %ecx
movl 4(%esp), %edi
movl 8(%esp), %esi
cpy:
cmpl $0, (%esi)
je done
movsb
loop cpy
done:
retGitHub
Parts that I feel can be optimized:
-
Because the
done label just executes the ret instruction:cmpl $0, (%esi)
je done
-
Because the
rep instruction-family seems a like better approach:movsb
loop cpy
Solution
You can use Bit Twidding Hack to determine if
If it has not you can copy whole
Simple
int32 or int64 has no zero bytes: http://graphics.stanford.edu/~seander/bithacks.html#ZeroInWordIf it has not you can copy whole
int32 or int64. So it will be 4 operations for searching zero byte in 8 bytes (in int64 case). It looks like true optimization.char * strcpy(char * dst, const char * src)
{
char * origin = dst;
while (!((((*(uint64_t *)src) - 0x0101010101010101ULL)
& ~(*(uint64_t *)src) & 0x8080808080808080ULL)))
{
*(uint64_t *)dst = *(uint64_t *)src;
src += 8;
dst += 8;
}
while (*dst++ = *src++)
;
return origin;
}Simple
strcpy implementation uses 8 compares to zero and 8 byte copyings for each 8 bytes of source string. My implementation uses 4 operation for checking for zeros and 1 operation to copy for 8 bytes. So we have 5ops vs 16ops. Not all ops have same speeds so it is not easy to compare real speedup. We need some benchmarks, is anyone free?Code Snippets
char * strcpy(char * dst, const char * src)
{
char * origin = dst;
while (!((((*(uint64_t *)src) - 0x0101010101010101ULL)
& ~(*(uint64_t *)src) & 0x8080808080808080ULL)))
{
*(uint64_t *)dst = *(uint64_t *)src;
src += 8;
dst += 8;
}
while (*dst++ = *src++)
;
return origin;
}Context
StackExchange Code Review Q#30337, answer score: 9
Revisions (0)
No revisions yet.