HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpModerate

SSE instruction to check if byte array is zeroes C#

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
ssebytearrayinstructionzeroescheck

Problem

My fundamental problem is how to check whether byte[] is full of zeroes. I posted a range of implementations (with timings) and one clearly beats others. In fact, it is so fast that I am having trouble believing it even works. I suggest taking a look at SO question page first. This is the winner code:

static unsafe bool IsAllZeros(byte[] data)
{
    fixed (byte* bytes = data) {
        int len = data.Length;
        int rem = len % (16*16);
        Vector16b* b = (Vector16b*)bytes;
        Vector16b* e = b + (len - rem) / (16*16);
        Vector16b zero = Vector16b.Zero;

        while (b+15 < e) {
            if (*(b)+*(b+1)+*(b+2)+*(b+3)+*(b+4)+*(b+5)+*(b+6)+*(b+7)+*(b+8)+
                *(b+9)+*(b+10)+*(b+11)+*(b+12)+*(b+13)+*(b+14)+*(b+15) != zero)
                return false;
            b += 16;
        }

        for (int i = 0; i < rem; i++)
            if (data [len - 1 - i] != 0)
                return false;

        return true;
    }
}


Project references Mono.Simd assembly. I checked the code for following issues so far:

  • Pointer arithmetic. Adding 1 to pointer moves it 16 bytes ahead. However loop is unrolled 16 times so iteration processes 256 bytes at a time.



  • Array length is in bytes while b e are in 16 byte vectors.



  • Loop condition checks if last element b+15 is within the array. End pointer e points to just outside of array.



  • Loop processes 16 vectors so it moves pointer by 16 elements.



  • Tail is processed for up to 15 vectors, each 16 bytes, one byte at a time.



Is this code correct?

Solution

This code is broken

When unrolling you add all the Vector16b.

From github.com/mono:

[Acceleration (AccelMode.SSE2)]
public static unsafe Vector16b operator + (Vector16b va, Vector16b vb)
{
    Vector16b res = new Vector16b ();
    byte *a = &va.v0;
    byte *b = &vb.v0;
    byte *c = &res.v0;
    for (int i = 0; i < 16; ++i)
        *c++ = (byte)(*a++ + *b++);
    return res;
}


So the vectors are added index to index, and then truncated back to byte. A sequence of adds can then end up as 0, fx:

16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 = 256


256 in binary is 0b100000000 and when you truncate that to byte you get 0b00000000.

You need to ensure that each individual Vector16b is equal to zero or use | instead of +.

Code Snippets

[Acceleration (AccelMode.SSE2)]
public static unsafe Vector16b operator + (Vector16b va, Vector16b vb)
{
    Vector16b res = new Vector16b ();
    byte *a = &va.v0;
    byte *b = &vb.v0;
    byte *c = &res.v0;
    for (int i = 0; i < 16; ++i)
        *c++ = (byte)(*a++ + *b++);
    return res;
}
16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 = 256

Context

StackExchange Code Review Q#108470, answer score: 14

Revisions (0)

No revisions yet.