patterncMajor

Custom memcmp function

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

custommemcmpfunction

Problem

I have written my own memcmp() function for learning purposes:

bool memcmp( byte* _bufA, byte* _bufB, uint n )
{
    while ( n > 4 )
    {
        if ( (int)*_bufA != (int)*_bufB )
            return false;
        _bufA += 4, _bufB += 4;
        n -= 4;;
    }
    while( --n )
    {
        if ( *_bufA != *_bufB )
            return false;
        ++_bufA, ++_bufB;
    }
    return true;
}

My thought process was that whenever the byte stream is greater than 4 bytes, I'd compare 4 bytes at a time instead of one. That means fewer iterations which would mean it would be a little more faster. And then just simply take care of the remaining 4 bytes or so in the second while statement.

Benchmarks:

Standard memcmp function comparing 1024 bytes: 585 nanoseconds

My memcmp function comparing 1024 bytes: 586 nanoseconds

It's pretty inconsistent but I figure it's still a pretty good function since it's 50/50 generally.

Solution

Correctness

Your function does not work correctly. In my experience, memory manipulating functions almost never work correctly as they are written the first time. (At least when I'm writing them.) If you need to go down into that rabbit hole, be sure to do so with an extensive set of unit-tests. In your case, you are lucky. You have two functions you can compare your results with:

The existing memcmp function from the standard library and

the naïve implementation of a for loop that compares one byte after the other.

When I'm writing optimized functions, I usually start with the naïve implementation. Then I write a comprehensive set of tests for it to make sure I got it right. (For the naïve version, this is usually easy.) Then I start optimizing the function and after each change, run the test suite again. Should one of my optimizations introduce a correctness bug, I'll notice.

Also be sure to run your unit tests trough Valgrind and compile test programs with sanitizers added. For GCC and Clang, all you have to do is pass the -fsanitize=WHAT switch where WHAT is one of address, memory or undefined. (memory is only supported by Clang yet.)

Cast the pointer, not the value

Have a look at this piece.

if ((int) *_bufA != (int) *_bufB)
  return false;

What does it do? It dereferences the pointers _bufA and _bufB which are of type byte (which I assume is an alias for char ). Now, only after that, you cast the result (which is of type byte) to an int. In effect, you're only looking at each forth byte.

What you have to do instead is casting the pointer before dereferencing it.

if (*((int *) _bufA) != *((int *) _bufB))
  return false;

Watch out for off-by-one errors

Your second loop is off-by-one.

while (--n) { … }

If I call your function with n == 1, it'll never look at any byte. If chux decides to call it with n == 0, it will explode. The correct version would be to use post-decrement.

while (n--) { … }

On the other hand, your first loop could use

while (n >= 4) { … }

instead of

while (n > 4) { … }

even though this “bug” doesn't cause the function to produce a wrong answer, it is probably not what you've meant.

Do memory operations on unsigned types

While I don't think it would cause a problem in this particular case, using signed integer types can lead to all kinds of surprises. If you don't treat the values as numbers but as a bunch of bytes, stay with unsigned types.

Beware of undefined behavior

The C standard does not allow you to cast a pointer to another type (except char) to access the memory it points to. (This is a simplified explanation, search for “strict aliasing rules” for more information.) Therefore, your function invokes undefined behavior and this cannot be fixed other than by using the standard library functions.

That said, I think that this is a case where you can get pretty far with (extremely) carefully blundering the line of undefined behavior if you know what you are doing.

On many machines, memory must be aligned. That is, an object o of type T must have an address such that ((uintptr_t) &o) % alignof(T) == 0 where alignof(T) is called the alignment of type T. Unfortunately, alignof is not an operator in C (unlike in C++ since C++11). A safe estimate is to use the maximum of sizeof(T) and 8. Even on those machines where an unaligned memory access does not cause an exception, it might be slow.

Therefore, what you should do is check whether both pointers can be word-aligned by advancing both of them by the same number of bytes. If so, first do a short loop that aligns the buffers, then cast to a machine word type and compare the bulk. Finally, compare the remaining bytes with a byte-for-byte loop again.

-fsanitize=undefined can warn you about mis-aligned reads.

Portability

Don't make assumptions on sizeof(int)

Your function silently assumes that sizeof(int) == 4. This will not be true on all platforms. Instead of hard-coding such values, use sizeof(int). You can store it in a constant if repeating it seems too much typing.

Performance

Prefer longs over ints

In absence of any reason to do otherwise, unsigned long is your best bet to get a type that directly maps to a machine-word and will therefore give you optimal performance. Once I made this change to your code, I got significant speedup on my 64 bit machine from 2.9 GiB/s to 4.3 GiB/s!

Consider exiting early

If you anticipate that your function might regularly be called with aliasing pointers, it might be worthwhile to check for this.

if (p1 == p2)
  return true;

It doesn't cost much if the condition is false and will be an enormous short-cut if the pointers do alias.

Be sure to measure

Doing low-level optimization without measuring is a waste of time. You should compare your function with (at least) two alternatives:

the naïve byte-for-byte approach and

at least one high

Code Snippets

if ((int) *_bufA != (int) *_bufB)
  return false;

if (*((int *) _bufA) != *((int *) _bufB))
  return false;

while (--n) { … }

while (n--) { … }

while (n >= 4) { … }

Context

StackExchange Code Review Q#118909, answer score: 22

Revisions (0)

No revisions yet.