patterncMajor
Custom memcmp function
Viewed 0 times
custommemcmpfunction
Problem
I have written my own
My thought process was that whenever the byte stream is greater than 4 bytes, I'd compare 4 bytes at a time instead of one. That means fewer iterations which would mean it would be a little more faster. And then just simply take care of the remaining 4 bytes or so in the second
Benchmarks:
It's pretty inconsistent but I figure it's still a pretty good function since it's 50/50 generally.
memcmp() function for learning purposes:bool memcmp( byte* _bufA, byte* _bufB, uint n )
{
while ( n > 4 )
{
if ( (int)*_bufA != (int)*_bufB )
return false;
_bufA += 4, _bufB += 4;
n -= 4;;
}
while( --n )
{
if ( *_bufA != *_bufB )
return false;
++_bufA, ++_bufB;
}
return true;
}My thought process was that whenever the byte stream is greater than 4 bytes, I'd compare 4 bytes at a time instead of one. That means fewer iterations which would mean it would be a little more faster. And then just simply take care of the remaining 4 bytes or so in the second
while statement.Benchmarks:
- Standard
memcmpfunction comparing 1024 bytes: 585 nanoseconds
- My
memcmpfunction comparing 1024 bytes: 586 nanoseconds
It's pretty inconsistent but I figure it's still a pretty good function since it's 50/50 generally.
Solution
Correctness
Your function does not work correctly. In my experience, memory manipulating functions almost never work correctly as they are written the first time. (At least when I'm writing them.) If you need to go down into that rabbit hole, be sure to do so with an extensive set of unit-tests. In your case, you are lucky. You have two functions you can compare your results with:
When I'm writing optimized functions, I usually start with the naïve implementation. Then I write a comprehensive set of tests for it to make sure I got it right. (For the naïve version, this is usually easy.) Then I start optimizing the function and after each change, run the test suite again. Should one of my optimizations introduce a correctness bug, I'll notice.
Also be sure to run your unit tests trough Valgrind and compile test programs with sanitizers added. For GCC and Clang, all you have to do is pass the
Cast the pointer, not the value
Have a look at this piece.
What does it do? It dereferences the pointers
What you have to do instead is casting the pointer before dereferencing it.
Watch out for off-by-one errors
Your second loop is off-by-one.
If I call your function with
On the other hand, your first loop could use
instead of
even though this “bug” doesn't cause the function to produce a wrong answer, it is probably not what you've meant.
Do memory operations on unsigned types
While I don't think it would cause a problem in this particular case, using signed integer types can lead to all kinds of surprises. If you don't treat the values as numbers but as a bunch of bytes, stay with unsigned types.
Beware of undefined behavior
The C standard does not allow you to cast a pointer to another type (except
That said, I think that this is a case where you can get pretty far with (extremely) carefully blundering the line of undefined behavior if you know what you are doing.
On many machines, memory must be aligned. That is, an object
Therefore, what you should do is check whether both pointers can be word-aligned by advancing both of them by the same number of bytes. If so, first do a short loop that aligns the buffers, then cast to a machine word type and compare the bulk. Finally, compare the remaining bytes with a byte-for-byte loop again.
Portability
Don't make assumptions on
Your function silently assumes that
Performance
Prefer
In absence of any reason to do otherwise,
Consider exiting early
If you anticipate that your function might regularly be called with aliasing pointers, it might be worthwhile to check for this.
It doesn't cost much if the condition is false and will be an enormous short-cut if the pointers do alias.
Be sure to measure
Doing low-level optimization without measuring is a waste of time. You should compare your function with (at least) two alternatives:
Your function does not work correctly. In my experience, memory manipulating functions almost never work correctly as they are written the first time. (At least when I'm writing them.) If you need to go down into that rabbit hole, be sure to do so with an extensive set of unit-tests. In your case, you are lucky. You have two functions you can compare your results with:
- The existing
memcmpfunction from the standard library and
- the naïve implementation of a
forloop that compares one byte after the other.
When I'm writing optimized functions, I usually start with the naïve implementation. Then I write a comprehensive set of tests for it to make sure I got it right. (For the naïve version, this is usually easy.) Then I start optimizing the function and after each change, run the test suite again. Should one of my optimizations introduce a correctness bug, I'll notice.
Also be sure to run your unit tests trough Valgrind and compile test programs with sanitizers added. For GCC and Clang, all you have to do is pass the
-fsanitize=WHAT switch where WHAT is one of address, memory or undefined. (memory is only supported by Clang yet.)Cast the pointer, not the value
Have a look at this piece.
if ((int) *_bufA != (int) *_bufB)
return false;What does it do? It dereferences the pointers
_bufA and _bufB which are of type byte (which I assume is an alias for char ). Now, only after that, you cast the result (which is of type byte) to an int. In effect, you're only looking at each forth byte.What you have to do instead is casting the pointer before dereferencing it.
if (*((int *) _bufA) != *((int *) _bufB))
return false;Watch out for off-by-one errors
Your second loop is off-by-one.
while (--n) { … }If I call your function with
n == 1, it'll never look at any byte. If chux decides to call it with n == 0, it will explode. The correct version would be to use post-decrement.while (n--) { … }On the other hand, your first loop could use
while (n >= 4) { … }instead of
while (n > 4) { … }even though this “bug” doesn't cause the function to produce a wrong answer, it is probably not what you've meant.
Do memory operations on unsigned types
While I don't think it would cause a problem in this particular case, using signed integer types can lead to all kinds of surprises. If you don't treat the values as numbers but as a bunch of bytes, stay with unsigned types.
Beware of undefined behavior
The C standard does not allow you to cast a pointer to another type (except
char) to access the memory it points to. (This is a simplified explanation, search for “strict aliasing rules” for more information.) Therefore, your function invokes undefined behavior and this cannot be fixed other than by using the standard library functions.That said, I think that this is a case where you can get pretty far with (extremely) carefully blundering the line of undefined behavior if you know what you are doing.
On many machines, memory must be aligned. That is, an object
o of type T must have an address such that ((uintptr_t) &o) % alignof(T) == 0 where alignof(T) is called the alignment of type T. Unfortunately, alignof is not an operator in C (unlike in C++ since C++11). A safe estimate is to use the maximum of sizeof(T) and 8. Even on those machines where an unaligned memory access does not cause an exception, it might be slow.Therefore, what you should do is check whether both pointers can be word-aligned by advancing both of them by the same number of bytes. If so, first do a short loop that aligns the buffers, then cast to a machine word type and compare the bulk. Finally, compare the remaining bytes with a byte-for-byte loop again.
-fsanitize=undefined can warn you about mis-aligned reads.Portability
Don't make assumptions on
sizeof(int)Your function silently assumes that
sizeof(int) == 4. This will not be true on all platforms. Instead of hard-coding such values, use sizeof(int). You can store it in a constant if repeating it seems too much typing.Performance
Prefer
longs over intsIn absence of any reason to do otherwise,
unsigned long is your best bet to get a type that directly maps to a machine-word and will therefore give you optimal performance. Once I made this change to your code, I got significant speedup on my 64 bit machine from 2.9 GiB/s to 4.3 GiB/s!Consider exiting early
If you anticipate that your function might regularly be called with aliasing pointers, it might be worthwhile to check for this.
if (p1 == p2)
return true;It doesn't cost much if the condition is false and will be an enormous short-cut if the pointers do alias.
Be sure to measure
Doing low-level optimization without measuring is a waste of time. You should compare your function with (at least) two alternatives:
- the naïve byte-for-byte approach and
- at least one high
Code Snippets
if ((int) *_bufA != (int) *_bufB)
return false;if (*((int *) _bufA) != *((int *) _bufB))
return false;while (--n) { … }while (n--) { … }while (n >= 4) { … }Context
StackExchange Code Review Q#118909, answer score: 22
Revisions (0)
No revisions yet.