patterncMinor
Re-implementing memcpy
Viewed 0 times
implementingmemcpystackoverflow
Problem
I have written an implementation of
The reason for this is because I am writing my operating system (which will be of course reviewed on parts here on Code Review) it is supposed to work efficiently on both
I made some aggressive testing with
memcpy function and I want to know how good is it and if it gets the most out of the processor capabilities or not.The reason for this is because I am writing my operating system (which will be of course reviewed on parts here on Code Review) it is supposed to work efficiently on both
x86_64 and x86 machines#include
#include
void *memcpy(void *dest, const void *src,size_t n)
{
if (n==0)
return dest;
#if defined(__x86_64__) || defined(_M_X64)
size_t i=0;
if(n>=8)
while(i=4){
*(uint32_t*)(dest + i) = *(uint32_t*)(src+i);
i+=4;
}
if(n-i>=2){
*(uint16_t*)(dest+i) = *(uint16_t*)(src+i);
i+=2;
}
if(n-i>=1)
*(uint8_t*)(dest+i) = *(uint8_t*)(src+i);
#elif defined(__i386) || defined(_M_IX86)
size_t i=0;
if(n>=4)
while(i=2){
*(uint16_t*)(dest+i) = *(uint16_t*)(src+i);
i+=2;
}
if(n-i>=1)
*(uint8_t*)(dest+i) = *(uint8_t*)(src+i);
#endif
return dest;
}I made some aggressive testing with
memcmp() to check for correct data transmission and valgrind to check for memory leaks and It passed all the tests. I didn't post the testing code because I think it could be useless since I don't want it to be reviewed.Solution
Not very fast
Your
If you research the various
My own benchmarks
I ran your version against the following two versions. One is a straightforward byte by byte copy, and the other is just using "rep movsb", which on modern processors is highly optimized:
My results for copying 2GB (32-bit host):
Your
memcpy() implementation is not really better than a standard byte by byte copy. Even though you attempt to copy more bytes at a time, the limiting factor isn't actually the number of bytes you copy per instruction.If you research the various
memcpy() implementations there are for x86 targets, you will find a wealth of information about how to get faster speeds. I think the simplest thing for you to do is to just use the simple "rep movsb" implementation.My own benchmarks
I ran your version against the following two versions. One is a straightforward byte by byte copy, and the other is just using "rep movsb", which on modern processors is highly optimized:
void *memcpy2(void *dst, const void *src,size_t n)
{
size_t i;
for (i=0;i<n;i++)
*(char *) dst++ = *(char *) src++;
return dst;
}
void *memcpy3(void *dst, const void *src, size_t n)
{
void *ret = dst;
asm volatile("rep movsb" : "+D" (dst) : "c"(n), "S"(src) : "cc", "memory");
return ret;
}My results for copying 2GB (32-bit host):
OP's function : 3.74 sec
memcpy2 (naive): 3.74 sec
memcpy3 (movsb): 2.96 secCode Snippets
void *memcpy2(void *dst, const void *src,size_t n)
{
size_t i;
for (i=0;i<n;i++)
*(char *) dst++ = *(char *) src++;
return dst;
}
void *memcpy3(void *dst, const void *src, size_t n)
{
void *ret = dst;
asm volatile("rep movsb" : "+D" (dst) : "c"(n), "S"(src) : "cc", "memory");
return ret;
}OP's function : 3.74 sec
memcpy2 (naive): 3.74 sec
memcpy3 (movsb): 2.96 secContext
StackExchange Code Review Q#105114, answer score: 7
Revisions (0)
No revisions yet.